
RATAN TATA 
LIBRARY 


SI. No. 
Ac. No. 


D.U.e. 93S—10-83/3O,00tf 


RATAN TATA LIBRARY (DULS) 

Delhi School of Economics 






Date of release for loan 


This book should be returned on or before the date last’*^stamped 
below. An overdue charge of Ten F^aise will be charged for eacfi'^ay the 
book is kept overtime. 





RATAN TATA LIBRARY 

r-i> 


TITLES 11. TixL 

E.L.B.S. 

LOW-PRICED TEXTBOOK SERIES . 


Barnetf and Wilson 

Inorganic Chemistry 

15j. 

Emeleus and Anderson 

Modern Aspects of Inorganic Chemistry 

\2s. 

Finar 

Organic Chemistry 



Volume I 

\2s. 


Volume II 

Ms. 

Kipping and Kipping 

Organic Chemistry 



Parts 1 and II 

9s. 


Part III 

. 9^. 

Mee 

Physical Chemistry 

"lls. 

Partington 

A Textbook of Inorganic Chemistry 

1^. 

Vogel 

A Textbook of Quantitdtive IhOrganic Analysis 

1,51 j. 

^ / 

iESANT and Ramsey 

. A Treatise on Hydromechanics 



Parts I and II 

12s. 

Born 

Atomic Physics 

15s. 

Otchburn 

Light 

21s. 

xJUNCAN and Starling 

A Textbook of Physics 

12s. 

Elton 

Introductory Nuclear Theory 

15s. 

Ramsey 

Statics 

9s, 

Ramsey 

Dynamics 



Part I 

9s. 


Part II 

9s. 

Richardson 

Sound 

12s. 

Roberts 

Heat and Thermodynamics 

15s. 

Starling 

Electricity and Magnetism for Degree Students 

12s. 

Blakey 

University Mathematics 

12s. 

COPSON 

An Introduction to the Theory of Functions of a 



Complex Variable 

12s. 

Ferrar 

Algebra 

9s. 

Ferrar 

DiJferent^af^ Calculus 

12s. 

Ferrar 

Integral Calculus 

15s. 

iHARDY 

A Course of Pure Mathematics 

9s. 

^Jaegar 

Introduction to Applied Mathematics 

15j. 

Lamb 

An Elementary Course of Infinitesimal Calculus 

125. 

Littlewood 

A University Algebra 

125. 

Maxwell 

Plane Projective Geometry , 

12s. 

Semple and Kneebone 

Algebraic Projective Geometry^ 

155. 

Titchn*arsh 

The Theory of Functions 

\2s. 



Wl^THERBURN 

A First Course in Mathematical Statistics 

\ls. 

Weatherburn 

Elementary Vector Analysis 

9s. 

Baker 

The Steel Skeleton 



Volume I 

15j. 


Volume IT 

\5s. 

Bevan 

The Theory of Machines 

llv. 

Blyth 

A Geology for Engineers 

\2s. 

Cattermole 

Transistor Circuits 

155. 

Chapman 

Workshop Technology 



Part I 

95. 


Partn 

95. 


Part III 

125. 

Clayton and Hancock 

The Performance and Design of D,C. Machines 

155. 

Cotton 

Electrical Technology 

95. 

Cotton 

The Transmission and Distribution of Electrical 



Energy 

125. 

CouLSON and Richardson 

Chemical Engineering 



Volume I 

155. 


Volume II 

215. 

Gavin and Houldin 

Principles of Electronics 

125. 

Golding 

Electrical Measurements and Measuring 



Instruments 

155. 

Kearton 

Steam Turbine Theory and Practice 

155. 

Lewitt 

Hydraulics and Fluid Mechanics 

125. 

Morley 

Theory of Structures 

125. 

Morley 

Strength of Materials 

125. 

Rose 

Mathematics for Engineers 



Part I 

95. 


Part n 

95. 

Say 

The Performance and Design of A. C. Machines 

12y. 

Starr 

Generation, Transmission and Utilization of 



Electrical Power 

125. 

Wrangham 

The Theory and Practice of Heat Engines 

215. 

^Benham 

Economics 

95. 

Crowther 

An Outline of Money 

95. 

Lewis 

The Theory of Economic Growth 

125. 

^Marshall 

Principles of Economics 

125. 

PiGOU 

The Economics of Welfare 

125. 

Sayers 

Modern Banking 

125. 

, Stonier and Hague 

A Textbook of Economic Theory 

125. 



A FIRST COURSE IN 

MATHEMATICAL 

STATISTICS 


BY 

C. E. WEATHERBURN 

M.A., D.Sc., Hon. LL.D. 

Emeritus Professor of Mathematics in the University 
of Western Australia 



THE ENGLISH LANGUAGE BOOK SOCIETY 

AND 

CAMBRIDGE UNIVERSITY PRESS 



PUBLISHED rOR 

THE English language book society 

BY 

THE SYNDICS OF THE CAMBRIDGE UNIVERSITY PRESS 
Bentley House, 200 Euston Hoad, London, N.W. 1 


First Edition 1946 

Second Edition 1949 

E.L,B.S. Edition 1961 

Reprinted 1962 


Printed in Qredt cU the University Press, Cambridge 

{Brooi^^rntehley, University Printer) 



PROFESSOR R. A. FISHER 

IN ADMIRATION OF HIS 

CONTRIBUTIONS TO MATHEMATICAL STATISTICS • 
AND TO THE MEMORY OP 

PROFESSOR KARL PEARSON 

WHO PLAYED AN OUTSTANDING PART 
IN THE EARLIER DEVELOPMENT OF THE SUBJECT 




CONTENTS 


Preface page xiii 

Chapter I 

FREQUENCY DISTRIBUTIONS 


1. Arithmetio mean. Partition values 1 

2. Change of origin and unit 4 

3. Variance. Standard deviation 6 

4. Moments 8 

6. Grouped distribution 10 

6. Continuous distributions 12 

Examples I 16 


Chapter II 

PROBABILITY AND PROBABILITY DISTRIBUTIONS 


/' 

7. Explanation of terms. Measure of probability 19 

8. Theorems of total and compound probability 21 

9. Probability distributions. Expected value 24 

10. Expected value of a sum or a product of two variates 26 

11. Repeated trifids. Binomial distribution' 28 

12. Continuous probability distributions 30 

13. Theorems of Tchebychef and Bernoulli 32 

14. Empirical definition of probability 34 

15. M oment generating function fiuid characteristic function 36 

16. Cumulative function of a distribution 39 

^ Examples II 42 

Chapter III * 


SOME STANDARD DISTRIBUTIONS 

Binomial and Poissonian Distributions 

17. The binomial distribvAion 46 

18. l^oisson's ^tribution 47 



VIU 


Contents 


The Normal Distribution 

19. Derivation from the binomial distribution page 50 

20. Some properties of the normal distribution 61 

21. Probabilities and relative frequencies for various intervals 66 

22. Distribution of a sum of independent normal variates 67 

E^camplbs III 

MATHEMATIOAIi NOTBS 63 

Chapter IV 

BIVARIATE DISTRIBUTIONS. 


REGRESSION AND CORRELATION 


23. Discrete distributions. Moments 67 

24. Continuous distributions 68 

26. Lines of regression 69 

26. Coefficient of correlation. Standard error of estimate 72 

27. Estimates from the regression equation 74 

28. Change of units 76 

29. Numerical illustration 76 

30. Correlation of ranks ' 79 

31. Bivariate probability distributions 80 

32. Variance of a sum of variates 82 

Examples IV 83 

Chapter V 


FURTHER CORRELATION THEORY. 
CURVED REGRESSION LINES 


33. Arrays. Linear regression 87 

34. Correlation ratios 89 

35. Calculation of correla1»ion ratios 91 

36. Other relations 02 

37. Continuous distributions 93 

38. Bivariate normal distribution 05 

39. Intraclass correlation ^ 07 



Contents 


IX 


Curved Regression Lines 

40. Polynomial regression. Normal equations page 99 

41. Index of correlation 102 

42. Some related regressions 104 

Examples V 106, 

Chapter VI 

THEORY OF SIMPLE SAMPLING 

43. Random sampling from a population 109 

Sampling of Attributes 

44. Simple sampling of attributes 109 

45. Large samples. Test of significance 111 

46. Comparison of large samples 112 

47. Poissonian and Lexian sampling. Samples of varying size 114 

Sampling of Values of a Variable 

48. Random and simple sampling 116 

49. Sampling distributions. Standard errors 117 

50. Sampling distribution of the mean 119 

51. Normal population. Fiducial limits for imknown mean 121 

52. Comparison of the means of two large samples 122 

53. Standard error of a partition value 124 

Examples VI 126 

Chapter VII 

STANDARD ERRORS OF STATISTICS 

54. Notation. Variances of population and sample 130 

55. Standard errors of class frequencies 131 

56. Covariance of the frequencies in different classes 132 

57. Standard errors in moments about a fixed value 133 

58. Covaricmoe of moments of different orders about a fixed value 135 

59. Standard errors of the variance and the standard deviation of a large 

sample 136 

60. Cpmparisoiv of the standard deviations of two large samples 137 



X 


Contents 


Sampling from a Bivariate Population 

61. Sampling covariance of the means of the variables 

62. Variance and covariance of moments about a fixed point 

63. Standard error of the covariance of a large sample 

64. St€indard error of the coefficient of correlation 


]SXAA£F2:.ES VII 


Chapter VIII 


BETA AND GAMMA DISTRIBUTIONS 


page 138 


66. Beta and Gamma functions 

66. Relation between the two functions 

67. Gamma distribution and Gamma variates 

68. Sum of independent Gamma variates 

69. Beta distribution of the first kind 

70. Alternative proof of theorems 

71. Product of a Pi{l, m) variate and a y{l+m) variate 

72. Beta distribution of the second kind 

73. Quotient of independent Gamma variates 


Examples VIII 


Chapter IX 


CHI-SQUARE AND SOME APPLICATIONS 


74. Chi-square and its distribution ^ 

76. Orthogonal linear transformation 

76. Linear constraints. Degrees of freedom 

77. Distribution of the * sum of squares * for a random sample from a normal 

population 

78. Nature of the chi-square test. An illustration 

79. Test of goodness of fit 

80. Numerical examples 

81. Additive property of chi-square 

82. Samples from an uncorrelated bivariate normal population. Distribu¬ 

tion of the correlation coefficient 

83. Distribution of regression coefficients and correlation ratios 
Examples IX 


181 



Contents 


XI 


Chapter X 

FURTHER TESTS OF SIGNIFICANCE. 


SMALL SAMPLES 

84. Small samples page 186 

*Studenfs’ Distribution 

86. The statistic t and its distribution * 186 

86. Test for an assumed population mean 189 

87. Fiducial limits for the population mean 189 

88. Comparison of the means of two samples 190 

89. Significance of an observed correlation 192 

90. Significance of an observed regression coefficient 194 

91. Distribution of the range of a sample 196 


Distribution of the Variance Ratio 


92. Ratio of independent estimates of the population variance 196 

93. Fisher’s z distribution. Table of F 198 

Fisher^s Transformation of the Correlation Coefficient 

94. Distribution of r. Fisher’s transformation 200 

96. Comparison of correlations in independent samples 202 

96. Combination of estimates of a correlation coefficient 203 

Examples X 206 

Chapter XI 


ANALYSIS OF VARIANCE AND COVARIANCE 


Analysis of Variance 

97. Resolution of the ‘sum of squares’ 209 

98. Homogeneous population. One criterion of classification 210 

99. Calculation of the sums of squares 212 

100. Two criteria of classification 214 

101. The Latin square. Three criteria of classification 217 

102. Significance of an observed correlation ratio 221 

103. Significance of a regression function 223 

104. Test for non-linearity of regression 224 



Contenis 


• • 

xa 

Analysis of Covariance 

105. Resolution of the *8uiu of products*. One criterion of classification pays 226 


106. Calculation of the sums of products 228 

107. Examination and elimination of the effect of regression 220 

108. Two criteria of classification 233 

.Examples XI 236 

Chapter XII 


MULTIVARIATE DISTRIBUTIONS. 
PARTIAL AND MULTIPLE CORRELATIONS 


109. Introductory. Yule’s notation 242 

110. Distribution of three or more variables 244 

111. Determination of the coefficients of regression 246 

112. Multiple correlation 249 

113. Partial correlation 250 

114. Reduction formula for the order of a standard deviation 253 

116. Reduction formula for the order of a regression coefficient 254 

116. Normal distribution 255 

117. Significance of an observed partial correlation 256 

118. Significance of an observed multiple correlation 257 

Examples XTT 260 

Litebatube fob Refebenox 

Index 273 

TABLES 

1. Ordinates of the Normal Curve 53 

2. Area under the Normal Curve 56 

3. Table of Chi-square 171 

4. Table of < 188 

6. Significant values of r 193 

6. Table of P? The 5 and 1 % ^points* 199 

7. Table of r and z, Fisher’s transformation 201 



PREFACE 


The object of this work is to provide a mathematical text on the 
Theory of Statistics, adapted to the needs of the student with an 
average mathematical equipment, including an ordinary knowledge 
of the Integral Calculus. The subject treated in the following 
pages is best described not as Statistical Methods but as Statistical 
Mathematics, or the mathematical foundations of the inter'pretation 
of statistical data. The writer’s aim is to explain the underlying 
principles, and to prove the formulae and the validity of the 
methods which are the common tools of statisticians. Numerous 
examples are given to illustrate the use of these formulae; but, in 
nearly all cases, heavy arithmetic is purposely avoided in the desire 
to focus the attention on the principles and proofs, rather than on 
the details of numerical calculation. 

The treatment is based on a course of about sixty lectures on 
Statistical Mathematics, which the author has given annually in 
the University of Western Australia for several years. This course 
was undertaken at the request of the heads of some of the science 
departments, who desired for their students a more mathematical 
treatment of the subject than those usually provided by courses 
on Statistical Methods. The class has included graduates and 
undergraduates whose researches and studies were in Agriculture, 
Biology, Economics, Psychology, Physics and Chemistry. On 
account of such a diversity of interest the lectures were designed 
to provide a mathematical basis, suitable for work in any of the 
above subjects. No technical knowledge of any particular subject 
was assumed. 

The first five chapters deal with the properties of distributions 
in general, and of some standard distributions in particular. It is 
desirable that the student become familiar with these, before 
being confronted with the theory of sampling, in which he is 
required to consider two^ or more distributions simultaneously. 
However, the reader who wishes to make an earlier start with 
sampling theory may take Chapters vi and vn (as far as §60) 
imme^ately ^fter Chapter m, since these are independent of the 



xiv Preface 

theory of Correlation. The theory of Partial and Multiple Correla¬ 
tions has been left for the final chapter, in order not to delay the 
study of sampling theory and tests of significance. But those who 
wish to study this subject earlier may read this chapter im¬ 
mediately after Chapter v, or even after Chapter iv. This order, 
however, is not recommended for the beginner. 

A feature of the book is the use of the properties of Beta and 
Gamma variates in proving the sampling distributions of the 
statistics, which are the basis of the common tests of significance. 
Consequently a special chapter (vm) is devoted to the properties 
of these variates and their distributions. The treatment is simple, 
and does not assume any previous acquaintance with the Beta and 
Gamma functions. The author believes that the use of these 
variates brings both simplicity and cohesion to the theory. The 
student is strongly urged to master the theorems of Chapter vin 
before proceeding to tests of significance. In the preparation of 
this chapter, and the following one, much help was derived from 
the study of a recent paper by D. T. Sawkins.* 

The considerations, which determined the presentation of the 
subject of Probability in Chapter n, are the mathematical attain¬ 
ments of the students for whom the book is intended, and their 
requirements in studying the remaining chapters. The approach 
decided on is substantially that of the classical theory. After the 
proof of Bernoulli’s theorem, the relation between the a jpri<yri 
definition of probability and the statistical (or empirical) definition 
is considered; and the measure of probability by a relative fre¬ 
quency in each case is emphasized. If the book had been intended 
primarily for mathematical specialists, a different presentation of 
the theory would have been given. But it is futile to expect the 
average research worker to appreciate an exposition like Kolmo- 
goroff’sf or Cramer’s, { based on the theory of completely additive 

* ‘Elementary presentation of the frequency distributions of certain 
statistical populations associated with the normal population.’ Joum, and 
Proc, Roy. Soc, N.S.W. vol. 74, pp. 209-39. By D. T. Sawkins, Reader in 
Statistics at Sydney University. 

t Orundbegriffe Wahrscheirdichkeitsrechnung, Berlin, 1933. 

X Random Variablea and Probability Distributiona, University Press, 
Cambridge, 1937. , . 



Preface xv 

set functions. The elementary properties of the moment generating 
function and the cumulative function are also given in Chapter ii, 
and are used throughout the book. These functions are introduced, 
not as essential concepts, but as useful instruments which lead to 
simpler proofs of various theorems. What has been written about 
them should convince the reader that they deserve his careful 
attention. 

I wish to express my appreciation of the care bestowed on this 
book by the staff of the Cambridge University Press, and my 
pleasure in the excellence of the printing. My thanks are due to 
Professor R. A. Fisher and Messrs Oliver and Boyd for permission 
to print Tables 3, 4, 6 and 7, which are drawn from fuller tables 
in Fisher’s Statistical Methods for Research Wcrrkers. I am also 
indebted to Professor G. W. Snedecor and the Iowa Collegiate 
Press for permission to reproduce Table 6, which is extracted from 
a more complete table in Snedecor’s Statistical Methods. Lastly 
I wish to thank Mr D. T, Sawkins for help received in corre¬ 
spondence concerning statistical theory, and Mr Frank Gamblen 
for assistance in reading the final proof. 

C.E.W. 


PERTH, W.A. 
April, 1946 


NOTE ON THE SECOND EDITION 

The call for reprinting has given an opportunity to correct a 
number of small errors and misprints throughout the book, and 
to add a new reference here and there. Paragraph 91 on page 196 
is new to this edition. 


1949 


C.E.W. 




CHAPTER I 


FREQUENCY DISTRIBUTIONS 

1. Arithmj^tic mean. Partition values 

Consider a group of N persons in receipt of wages. Let x shillings 
be the wage of an individual on some specified day. Then, in general, 
X will be a variable whose value changes with the individual. 
Possibly the N values of x will not all be different. Suppose there are 
only n different values Xi, ..., which occur respectively 
/uAi times. The numbers/<, in which the subscript % takes the 
positive integral values 1, 2, ..., n, are called the frequencies of the 
values x^ of the variable a;; and the assemblage of values with their 
associated frequencies, is the frequency distribution of wages for that 
group of persons on the day specified. The sum of the frequencies is 
clearly equal to the number of persons in the group, so that 

=/i+/2+•••+/„= i/o (1) 

n 

where, as the equation indicates, S fi denotes the sum of the n 

1 

frequencies i taking the integral values 1 to n. When the range of 
values of i is understood, the sum will be denoted simply by S/< ov 

i 

2/f. The number N is the total frequency, The mean of the distribu¬ 
tion is the arithmetic mean x of the N values of the variable, and is 
therefore given by 

- 1 1 ** 

» = /<*<, ( 2 ) 

since the value occurs times. This formula expresses what is 
meant by saying that x is the weighted mean of the different values a;^, 
whose weights are their frequencies/^. 

Frequency distributions of many different variables will occur in 
the following pages. Thus the values of the variable x may be the 
heights, or the weights, or the ages of a group of persons, or the 



2 Frequency Distributions [i 

yields of grain per acre from a number of plots of land. For each 
finite distribution will denote the frequency of the value x^. The 
total frequency N is then given by (1), and the mean x of the 
distribution by (2). 

Example. The student to whom the above summation notation is new, 
may profitably verify the following relations. If a is a constant, 

i i 

2/<(a?< + o) = T,fiXi-bNa, 
i i 

i i i 

If (Xi, f/i) is a pair of corresponding values of two variables, x and y, with 
frequency /<, 

i i i 

The symbol used as a subscript in connection with summation is immaterial; 
but i, j, f, s, t are perhaps most commonly employed. 

Suppose that the frequency distribution of x consists of Ic partial 
or component distributions, x^ being the mean of the jth component 
and its total frequency, so that 

k 

Then that part of the sum which belongs to the ^’th com- 

% 

ponent has the value n^x^y and the relation (2) is equivalent to 

1 * 

x = j.^njXf. ( 3 ) 

i-T j ■■ 1 

Consequently the mean of the whole distribution is the weighted 
mean of the means of its components, the weights being the total 
frequencies in those components. 

Again, let u and v be two variables with frequency distributions 
in which a value of v corresponds to each value of u. Then the vdues 
of the variables occur in pairs. Let N be the number of pairs of 
values {u^y v^), and let 





1] Partition Values 8 

The arithmetic mean of the N values of x is equal to that of the N 
values of the seiSoncl member, which is w + v. Consequently 

x^u+v, (4) 

which expresses that the mean of the sum of two variables is equal 
to the sum of their means; and the result can be extended to the sum 
of any number of variables. The reader can prove similarly that, if 
a and b are constants, and 

x = au-\- bv, 

then x = au + bv, (4') 

Suppose the N values of the variable in the distribution to be 
arranged in ascending order of magnitude. Then the median is the 
middle value, if N is odd; while, if N isv^ven, it is the arithmetic 
mean of the middle pair, or, more generally, it may be regarded as 
any value in the interval between these middle values. Similarly, 
the quariiles, those^valucs in tlie range of the variable 

which divide the frequency into four equal parts; the second quartile 
being identical with the median; and the difference between the 
upper and lower quartiles, is the interquartile range. The 

deciles and percentiles are those values which divide the total, fre¬ 
quency into ten and one hundred equal parts respectively. The 
median, quartiles, deciles and percentiles are often spoken of 
collectively as partition valueSy since each set of values divides the 
frequency into a number of equal parts. Sometimes they are 
referred to as quantiles. 

That value of the variable whose frequency is a maximum is 
called a mode, or modal value, of the distribution. When, as usually 
happens, there is only one mode, the distribution is said to be 
unimodal. 

We shall presently consider continuous frequency distributions. 
But it should be pointed out at once that the variable may be either 
continuous or discrete. A continuous variable is one which is capable 
of taking any value between certain limits; for example, the stature 
of an adult man. A discrete variable is one which can take only 
certain specified values, usually positive integers; for example, the 



4 Frequency Distributions [i 

number of heads in a throw of ten coins, or the number of accidents 
sustained by a worker exposed to a given risk for a given time. Of 
course an observed frequency distribution can only contain a finite 
number of values of the variable, and in this sense all observed 
frequency distributions are discrete. Nevertheless, the distinction 
between continuous and discrete variables will be found to be of 
importance when we come to study populations and probability 
distributions. In the next few sections we shall assume that the 
values of the variables are discrete. 

2. Change of origin and unit 

The following graphical representation will be found helpful. 
Taking the usual a;-axis with origin 0, we may represent the variable 
X by the abscissa of the current point P. Then x is the abscissa of a 

0 A G P X 

I—-1-•---» 

Fio. 1 

fixed point 0, It is frequently convenient to take a new origin at 
some point A, whose abscissa is a. Let g be the abscissa of P relative 
to as origin. Then, since OP = OA + AP, we have 

X - a+g. 

Thus i is the excess of x above a, or the deviation, of x from that 
value. Taking the mean of each member of this equation we have 

x^a + l. ( 6 ) 

Thus x-a, which is the deviation of the mean value of x from a, is 
equal to |, which is the mean of the deviations of the values x^ from 
a. In particular, by taking a as we have the result that the sum of 
the deviations of the values from their mean is zero. This is also 
easily proved directly; for 

Tifii^i - 5 p ) - ^fiXi -Nx^O ( 6 ) 

i i 


in virtue of (2). 



2] Change of Origin 5 

In addition to choosing A as origin it may be convenient to use 
a diBFerent unit, say c times the original unit. Then, if w is the 
deviation of P from A measured in terms of this new unit, 

u = (x — a)lc 

or x = a + cu. (7) 

Taking the mean value of the variable represented by either wde 
we have, in virtue of (4'), 

x=^a + cu, (8) 

u being the mean value of u for the distribution. 

^^Example. Eight coins were tossed together, and the number x of hecuis 
resulting was observed. The operation was performed 256 times; and the 
frequencies that were obtained for the different values of x are shown in the 
followincr table. Calculate the mean, the median and the quartiles of the 
distribution of x. 


X 

/ 

i 


/c 

/c 

0 

1 

-4 

- 4 

16 

- 64 

1 

9 

~3 

-27 

81 

-243 

2 

26 

-2 

—52 

104 

-208 

3 

69 

-1 

-69 

69 

- 69 

4 

72 

0 

0 

0 

0 

6 

62 

1 

62 

62 

62 

6 

29 

2 

68 

116 

232 

7 

7 

3 

21 

63 

189 

8 

1 

4 

4 

16 

64 

Totals 

256 

— 

- 7 

607 1 

- 37 


The different values of x are shown in the first column, and their frequencies 
in the second. The calculation is simplified by taking the value cc = 4 as 
origin of J. The values of £ corresponding to those of x are given in the third 
colunm, and those of the prod net/f in the fourth. The remaining columns are 
not needed for the present. I’otaJs for the various columns are given in the 
bottom row. Hence 

I = ^ - 7/256 = - 0-027. 

N i 

The mean value of x is therefore 

X = 0 + 1 = 4-0-027 = ,‘1-973. 

The mode, being the value of x with the largest frequency, is clearly 4. To 
find the median we observe that the values of x are arranged in ascending 
order, and that the 128th and 129th are both 4. Hence the median is also 4. 
Similarly the 64th and 66th values are both 3, so that the lower quartile is 3. 
In tho same way we find that the upper quartile is 5. ^ 




6 Frequenqf Distributions [i 

'S. Variance. Standard deviation 

The mean square deviation of the variable x from the value a is, 
as the name implies, the mean value of the square of the deviation 

of X from a. It is therefore given by The positive square 

root of this quantity is the root-mean-sqvare deviation from a. In 
the important case in which the deviation is taken from the mean of 
the distribution, the mean square deviation is called the variance 
of X, and is denoted by /Ag. The reason for the notation will appear in 
the next section. The positive square root of the variance is called 
the standard deviation (s.d.) of x, and is denoted by a. Thus 

= = ( 9 )^ 

The variance (or the s.d.) may be taken as an indication of the 
extent to which the values of x are scattered. This scattering is 
called dispersion. When the values of x cluster closely round the 
mean, the dispersion is small. When those values, whose deviations 
from the mean are large, have also relatively large frequencies, the 
dispersion is large. The concepts of variance and s.d. will play a 
prominent part in the following pages. 

When the mean square deviation from any value a is known, and 
also the deviation | of the mean from that value, the variance is 
easily calculated. For 


^ (10) 

^This formula is of great importance, and will be constantly employed. 
On multiplying by N we have an equivalent relation, which may be 
expressed 

( 11 ) 


Thus Ncr^ is less than showing that the sum of the squares of 

i 

the deviations of the values x^ is least when the deviations are measured 
from the mean. , 



3] Variance and aj>. % 

Another possible measure of dispersion is the mean value of thei 
absolute deviation from the mean of the distribution, commonly 
called the mean deviation from the mean. This quantity, however, 
does not lend itself readily to algebraical treatment, and is therefore 
not nearly so important as the variance and the s.d. The semi- 
interquartile range is also sometimes taken as an indication of the 
magnitude of the dispersion. , 

The significance of the magnitude of the standard deviation 
clearly depends upon the values of the variable. Thus a s.d. of 6 in. 
in the measurements of the height of a tower, is much less significant 
than an equal s.d. in the measurements of the height of a man. The 
ratio of the s.D . to the mean value of t he variable is called 
coejjicient of variation. It is an absolute measure of dispersion in 
sense that it is independent of the unit employed. And by means of 
this coefficient we are able to compare the variabilities of distribu¬ 
tions of different characteristics, such as weight and height. Some¬ 
times the coefficient of variation is defined as 100 times the above - 
value, i.e. as the percentage of the mean which is equal to the S.D. 



Example 1. In the example of the preceding section the mean square 
deviation of a? from the value 4 is 507/256 = 1-98. Hence tho variance is 
given by 

o-a = 1.98-1* = 1-98-(0-027)* = 1-98, 
and or = 1-407 = 1-41 nearly. 

From tho/5 column it is clear that the sum of the absolute deviations from 
X = 4 is 277. By measuring deviations from the mean, instead of from a; = 4, 
we increase the absolute deviations of 161 values, and decrease those of 96 
values, by 0-027. Hence tho sum of the absolute deviations from the mean is 
277 + 66(0-027) = 278-78. The mean deviation is therefore 1-09 approxi¬ 
mately. 


Exaynple 2. Find the mean and the variance for the distribution in which 
the values of x are the positive integers 1, 2, 3, ...» N, the frequency of each ' 
being unity. 


Here 


^ iV(iV+l) 
*-2iV- 




The mean square deviation from a: = 0 is 

{l^ + 2*+... + N*)IN = \{N+l)(2N+l).- 
0-* = i(J/+1)(2J/+1)« 


Hence 



8 Frequency Distribuiiom [i 

Example 3. For the distribution expressed by 

a; = 6 6 7 8 9 10 11 12 13 14 16 

/ = 18 25 34 47 68 90 80 62 38 27 11 

the total frequency is 600. Show that the mean value of x is 10*054, the vari¬ 
ance 6*68, the s.D. 2*36, the median 10, and the lower and upper quartiles 9 
and 12 respectively. Also calculate the mean deviation from the mean as 
in Ex. 1. 

Example 4. A distribution consists of several component distributions. 
Express the variance of the whole distribution in terms of those of the 
components and the deviations of the means of the components from the 
general mean. 

Let be the frequency in the/th component, cr^ its 8.D., and =rxj — x the 
deviation of its mean from the general mean. Then the mean square deviation 
this component from the general mean is + and the sum of the 
il^uares of its deviations from the general mean is n^((jJ + c^). Hence the 
variance o’* of the whole distribution is given by 

No-^ = En^Co-J + c^)* 

where N is the total frequency Sn^. 

i 

^4. Moments 

In the notation of the preceding sections the mean value of the 
ijh po wer of the deviatjo n of the variable from the value a is 
This is usually called the rth moment of the distribution 

i 

about the value a, or the moment of order r. The term ‘moment’ is 
borrowed from Mechanics. Since fJN is the relative frequency of the 
value x^ in the distribution, and the deviation g from a is represented 
by the distance AP, the above expression can be regarded as the 
sum of the rth moments of the relative frequencies about A, The 
rth moment about t he mean of the distribution is denoted by 
The corresponding moment about a specified value other^han the 
mean, will be denoted* by /t'. Thus 

= ' ( 12 ) 

is the rth moment about the mean, while 

li'r = ^'LUXi-ar = ~'ZfiSi (13) 

♦ An alternative notation, with v, instead of /i', has some advantages. 



4] Moments 9 

is the rth moment about the value a. Putting r = 0 we see that 

/to = /*o ■= (14) 

Similarly, in virtue of (2) and (6), we have 

/tl = I /ta = 0. (15) 

The second moment about the mean is clearly the variance already 
discussed. 

By means of the binomial expansion, moments about the mean 
of the distribution may be expressed in terms of moments about 
any other value, x — a. Thus 


/t, = 


. 


(16) 


denoting the binomial coefficient, often written or (7J. In 
particular, in virtue of (14) and (16), 


/t2 = /t2-2lHl“ = /to-|* . (17) 

in agreement with (10). Similarly 

/^8 ~ 1 

A —' -T' I 

/^4 “ /^4 — J 

and so on. 

In calculating moments it is frequently convenient to change tlie 
unit. As in § 2, let u be the measure of the deviation from a; = a, in 
terms of a unit c times the original unit, so that ^ = cw. Then the 
rth moment of x about a is 

(19) 


Thus the rth moment of the variable x is o'* times the corresponding 
moment of the variable u, 

A distribution is said to be symmetrical when the frequencies are 
symmetrically distributed about the mean, that is to say, when 



10 Frequency Distributions , [i 

Tallies equidistant from the mean have equal frequencies. For 
example, the distribution expressed by 

12346678 
/= 1 8 28 66 70 66 28 8 1 

is symmetrical about its mean ^ = 4. In the case of a symmetrical 
distribution there is the simplification that all the moments of odd 
order about the mean are equal to zero, since the terms of the sum 
in (12) cancel in pairs. In the case of an unsymmetrical distribution, 
the degree of departure from symmetry is called its skewness. 
More than one measure of this property has been proposed. One of 
the simplest is /^a/cr®, while another is half this expression. These are 
clearly independent of the unit chosen for the variable, and they 
vanish if the distribution is symmetrical. Another measure of 
skewness, proposed by Karl Pearson, will be given later. 

Example 1. For the distribution of x in the example of § 2, the third moment 
about £ = 0 is 

/i8' =-37/266 = -0 146. 

Hence the third moment about the mean is given by 
/^8 = /4j-3J/4 + 2|» 

= - 0-145- 3( -0*027) (1-98) + 2( - 0*027)> 

= 0-018 nearly. 

The skewness, calculated from the formula is 
0*018/2*8 = 0*0064, 

which is very small. 

Example 2. For the distribution in § 3, Ex. 3, show that -- 1*92, and 
deduce that /ig/cr® = —0*146. 

5. Grouped distribution 

Frequently the number of different values of the variable repre¬ 
sented in the distribution is so large that, for convenience in cal¬ 
culating the moments, it becomes necessary to approximate by 
grouping the values. In such cases the range of variation of x is 
usually divided into a number of equal intervals. The group of 
values falling in a given interval constitutes a class; and the number 
of such values is the class frequency. The magnitude of an interval 
is called the class interval. For simplicity of calculation the nu^iber 



' 6] Grouped Distribution 11 

of intervals chosen will not be too large, preferably not more thjtfit 
20, or at the most 25; and, in order that the results may be suflS^ 
ciently accurate, the number must not be too small, preferably not 
less than 12. The approximation in the calculation of moments 
consists in regarding each value as equal to the mid-value of the 
interval in which it falls. We shall later meet formulae,* known as 
Sheppard's adjustments, giving the corrections that may be applied 
under certain conditions to the approximate values of the moments 
calculated as above. We merely mention here that, under the 
conditions referred to, the correction to thejnean may be neglected, I 
while the calculated variance should be reduced by c7l2, c bemgi 
the magnitude of the class interval. The following is an example of' 
the treatment of a distribution by grouping. 

Example, Over a period of years, 670 students were examined in Mathe¬ 
matics I at the annual examinations of the University of Western Australia. 
The marks gained by students ranged from 0 to 99, ail being integers. These 


Percentages in Mathematics 


Interval 

Mid. 

value 

/ 

h ■ • 

fu 

/u« 

/»• 

0 to 4 

2 

12 

-10 

-120 

1,200 

-12,000 

6 to 9 

7 

13 

- 9 

-117 

1,053 

- 9,477 

10 to 14 

12 

13 

- 8 

-104 

832 

- 6,656 

15 to 19 

17 

14 

- 7 

- 98 

686 

- 4,802 

20 to 24 

22 

23 

- 6 

-138 

828 

- 4,968 

25 to 29 

27 

23 

- 6 

-115 

575 

- 2,875 

30 to 34 

32 

29 

- 4 

-116 

464 

- 1,856 

35 to 39 

37 

34 

- 3 

-102 

306 

- 918 

40 to 44 

42 

44 

- 2 

- 88 

176 

- 352 

45 to 49 

47 

44 

- 1 

- 44 

44 

- 44 

60 to 64 

62- ) 

60 

0 

-1,042 


-43,948 

65 to 59 

67 

62 

1 

52 

52 

62 

60 to 64 

62 

61 

2 

122 

244 

483 

65 to 69 

67 

41 

3 

123 

369 

1,107 

70 to 74 

72 

32 

4 

128 

512 

2,048 

75 to 79 

77 

27 

6 

135 

675 

3,376 

80 to 84 

82 

23 

6 

138 

828 

4,968 

85 to 89 

87 

17 

7 i 

119 

833 

‘6,831 

90 to 94 

92 

13 

8 

104 

832 

6,656 

95 to 09 

97 

5 

9 

45 

405 

3,645 





966 


28,170 

Totals 

— 

570 

— 

- 76 

10,914 

-15,778 


* See § 16. 




12 Freqtiency Distributiom [i 

were grouped in 20 classes, with a class interval of 5, the class frequencies 
being as shown in the accompanying table. The mid-values of the intervals 
are 2, 7, 12, ..., 97. The calculation is simplified by taking the class interval 
as a new unit (c = 6), and the value a; = 62 as a new origin. The deviations u 
from this origin, measured in class intervals, are shown in the fourth column. 
The above choice of origin ensures that the larger frequencies are multiplied 
by the smaller values of u. In the row corresponding to li = 0 the entries for 
fiF, fu*, etc., are all zero, and need not be recorded. The space may therefore 
be used to record the sums of the negative numbers above this row. The sums 
of the positive numbers are similarly indicated at the bottom; and the total 
for each column is given in the last row. 

The mean value of u is therefore 

B = = -76/670 = -2/15 = -0-133. 

and the mean percentage 

X = a-t-cw = 62-0-667 = 61-333. 

The mean square deviation of u from u = 0 has the value 10,914/670 = 19-16; 
and the variemee of u, which may be denoted by cr'i, is 

(rl= 1915-tt*= 19-13. 

Consequently <r^ = 4-374, and therefore cr, = 21-87. If, however, Sheppard’s 
adjustment is made, we find 

ol = 19-047, or„ =: 4*365, (r, = 21*82. 

The third moment of u about w = 0 has the value — 15,778/670 = — 27*68. 
Thus, in virtue of (18), 

/t, = -27-68-f-7*66-0-006 = -20*03. 

The skewness of the distribution, calculated from the formula is 

— 0*24 nearly. The reader may verify that the mean deviation of x from the 
mean is about 17*7. 

The partition values may be estimated on the assumption that the fre¬ 
quency in any class is evenly distributed among the values in that class. The 
reader will find in this way that the median is 63, and the lower and upper 
quartiles 37 and 66 respectively. 

6. Continuous distributions 

The distributions considered so far are discrete distributions, or 
distributions of discrete values of the variable. A continmus dis¬ 
tribution is one in which the variable takes every value between 
certain limits, a and 6. The total frequency is therefore infinite; and 
so is the frequency within any finite interval, a to /?, of the range of 
the variable. We shall confine our attention to continuous distribu- 



^ 6 ] 


Contimums Distributions 


18 


tions, which are such that the relative frequency in the infinitesimal 
interval x — ^dx to x + \dx is expressible as_/(a:j(i*^ where/(*) is a 
continuous function of x, called the relative frequency density. Then 
the continuous curve ^ ^ 


y=fix) 




( 20 ) 


is the relative frequency curve for the distribution. If this curve is 
symmetrical about some line x = c, the distribution is said tcy be 
symmetrical. The infinitesimal interval x—to a; + Jda;, with mid- 
value X and magnitude dx^ may be conveniently referred to as the 
interval dx. The relative frequency f(x)dx in this interval is repre- 
s®fc^i)y the area under, the curve (20), between the ordinates at 
the ends of the interval. Hence the relative frequency for th^ 

interval a to is given by the integral f{x) dx. The sum of all the 
relative frequencies is unityso that i 


j; 


f{x)dx = 1. 


\ 


( 21 ) 


As in the case of a discrete distribution, the moment of order r 
about a specified value is the sum of the rth moments of the relative 
frequencies about that value. Hence the mean x, which is the first 
moment about the origin, is given by 


a; = J xf{x) dx. 


( 22 ) 


The moments of order r, about the origin and the mean, are 

/^r= f x^f{x)dx, llr=\ (x-xYf(x)dx. 

J a J a 


And, in virtue of the binomial expansion, it follows as in § 4 that 
the relations (16)-(18) hold for a continuous distribution also. In 
particular, the variance of the distribution is given by 


0*2 



(23) 


Example 1. By way of illustration consider a straight rod of length 1. 
The distance x of a molecule of the rod from one end may be regarded as a 
oontin^ious variable, ranging from 0 to 2; and the distribution of the values 




14 


Frequency Distributions 


[r 


of X is then continuous. If M is the mass of the rod, and the linear density m 
is continuous, the function/(a:) for the distribution has the value mjM, 
liilMh.the case of a uniform rod, of length 2a, the distance a; of a molecule from 
the iffiddle point varies continuously from — o to o, and the distribution of 
X is continuous with f{x) = l/2a. The frequency curve is a straight line parallel 
to the SB-axis, and the distribution of x is said to be rectangular or uniform. 
The mean value of x is now zero, and the variance is given by ^ 

»-> 

r.= l = 

-a 




a® 1/ 

= 3' v\ " -V 

All the moments of odd order about the mean are zero. The moment of order 
2n is easily shown to be a*’‘/(2n+1). 

Example 2. Draw the frequency curve for the symmetrical distribution 
in which 


m 


_2a/ 1 \ 

n \a^ + xy* 


the range of the variable being —a to a; and show that f{x) satisfies the 
condition (21). Also show that the variance is 

cr* = a^^--n)ln = 0*273a*, 

and the s.d. cr = 0*52a, while 

/e 4 = 0^1-8/377) = 0»161a*. 


It is apparent from the above that a discrete distribution cannot 
be identified with a continuous one. Sometimes, however, a discrete 
distribution Di ap proxim ates to a certain continuous distribution Dg 
in the sense that, for any interval in the range of the variable, the 
relative frequency of is approximately equal to that of 
Such an approximation requires that the total frequency N of Dj 
should be large, and that there should be only very small intervals 
between pairs of adjacent values of the variable. Consider, for 
example, the distribution of ages of all living people, not more than 
(say) 60 years old at a specified instant. The variable x ranges from 
0 to 50 years; and the frequency in any interval, not less than a few 
hours, is so large that, since a period of a few hours is very small 
compared with 60 years, the distribution may be regarded as 
continuous for all practical purposes. If the necessary data 
were available, we could calculate a continuous function f{x) to 
give the relative frequency density to any reasonable degree of 
accuracy. , 



"6] Continuous Distributions 15 

The diagram illustrates the frequency curve of an unsymmetrical, 
unimodal, continuous distribution. The mode of the distribution is 
the abscissa of the maximum ordinate ySince the area under 
the curve in any interval represents the relative frequency for that 



interval, it follows that the median is the abscissa of the ordinate 
MM\ which bisects the area.under the curve. And from (22) we see 
that the mean is the abscissa of the ordinate 00\ which passes 
through the centroid of the area under the curve. We may mention 
in passing that, when the skewness is small, the relation 

mean—mode = 3(mean—median) 

holds fairly accurately. The student may regard this as an empirical 
relation, since we shall not give any proof. The definition of skewness 
most frequently used is that of Karl Pearson. It is 

, mean — mode 

skewness =»-. 

S.D. 

When the mode can be accurately determined this definition is a 
convenient one. The frequency curve in the diagram is that of a 
positively skew distribution, the longer tail of the curve being to 
the right. 

The ratio of the fourth moment about the mean of a distribution, 
to the square of the variance, is independent of the unit employed. 
This invariant of the distribution is called its kurtosis, and is fre- 
quei^ly denoted by /?a* Thus 


(24) 



Frequency Distributions [r 

In the normal distribution, to be considered later, the kurtosis has 
the valu^ 3. Since this distribution is regarded as the standard or 
ideal, the quantity for any distribution is called its excess of 
kurtosis, or briefly its excess. Corresponding to the above notation, 
/Ji is used to denote the invariant fiUpil. This is the square of the 
skewness as deflned in § 4. 

COLLATERAL READING* 

Yule and Kendaix, 1937, 1, chapters vi-ix. 

Kenney, 1939, 3, part i, chapters i-v. 

Camp, 1934, 1, part i, chapters i-iv. 

Fisher, 1938, 2, chapters i and ii. 

Mills, 1938, 1, chapters rv and v. 

Jones, 1924, 3, chapters i-vn. 

Rider, 1939, 6, chapters i and n. 

Tippett, 1931, 2, chapters i and n. 

Ribtz (ed.), 1924, 1, chapter n. 

Gouldbn, 1939, 2, chapters i and ii. 

Bowlby, 1920, 2, part i, chapters v and vi; part ii, chapter i. 
Kendall, 1938, 6, 


EXAMPLES I 

1. A distribution consists of three components with frequencies 
of 200, 250 and 300, having means of 26, 10 and 16, and standard 
deviations of 3, 4 and 5 respectively. Show that the mean of the 
combined distribution is 16, and its s.d. 7-2 approximately. 

2. The yields of grain (a; lb.) from 600 small plots are grouped 
in classes with a common class interval (0-2 lb.) in the table below, 
the values of x given being the mid-values of the classes. Show 
that the mean of the distribution is 3‘95 lb., its s.d. 0*46 lb., the 
median also 3*95 lb., and the quartiles 3*63 and 4*28 lb. 


X 

/ 

X 

/ 

X 

/ 

X 

/ 

X 

/ 

2-8 

4 

3*4 

47 

4-0 

88 

4-6 

35 

6*2 

4 

30 

15 1 

3-6 

63 

4-2 

69 

4-8 

10 

— 

— 

3-2 

20 1 

3-8 

78 

4-4 

69 

50 

8 

_ 1 

— 


(Mercer and Hall, Joum. Agric. Sci. 1911, vol. 4, p. 107.) 


♦ The references are to the literature listed at the end of the book. The 
student is strongly advised to read some of the literature mentioned at the 
end of each chapter, preferably in the order given. He will thus acquire a 
better background for the mathematical theory. 




Examples 17 

3. The wages of 1,000 employees range from 4s. 6d. to 19s. Od. 
They are grouped in 16 classes with a common class interval of U., 
and the class frequencies, from the lowest class to the highest, are 
6, 17, 36, 48, 66, 90, 131, 173, 166, 117, 76, 62, 21, 9, 6. Tabulate 
the data, and show that the mean wage is 12*006s., the s.d. 
2*626s. = 2s. 7^d., and the median 12*127s. = 12s. IJd., and the 
mode 12‘36^s. = 12s. 4jd. nearly. (Adjusted s.d. 2*61s.) 

• 

4. With the notation of § 4, and by a method similar to that used 

in proving (16), show that y"- ^ 

Ai = /*4 + + I*- 

A^ = /‘r + »'E«r-l+ (2)^/*r-a+... + + 

5. The first three moments of a distribution about the value 2 
of the variable are 1, 16 and —40. Show that the mean is 3, the 
variance 16, and /^ 3 = — 8G. Also show that the first three moments 
about a; = 0 are 3, 24 and 76, 


6. Prove that the mean deviation from the median is less than 
that measured from any other value. (See Aitken, 1939, i, p. 32.) 

7. Show that, if the clasps interval of a grouped distribution is 
less than one-third of the calculated s.d., Sheppard’s adjustment 
makes a difference of less than i % in the estimate of the s.d. 


8. Show that, if the variable takes the values 0, 1, 2, 3, ..., n 
with frequencies proportional tb the binomial coefficients 

1, w, » Qj > ..., n, 1 respectively, then the mean of the distribu¬ 


tion is Jn, the second moment about a; = 0 is n{n+ l)/4, and the 
variance is 


9. In a continuous distribution, whose relative frequency density 
is given by J(x) = 3a;(2 — a;)/4, the variable ranges from 0 to 2. Show 
that the distribution is symmetrical, with mean a; = 1, and variance 
1/6. Show that the second and third moments about a; = 0 are 6/6 
and^8/6 respectively; and verify that /tg = 0. 



18 Frequency Distribniicna [i 

10 . Exponential distribution. Consider the oontinnons dis¬ 
tribution in which/(a;) = ae“^, a being positive, and the variable 
ranging from 0 to oo. Show that the mean is 1/a and the variance 
1/a*. Also prove that the second and third moments about 05 = 0 
are 2/a* and 6/a* respectively, and that yM, « 2/a*. 

11. Factorial moments. The factorial moments of a distribution 
are defined as follows. That of order r about the origin (a; = 0) is 

where <=x{x-l){x—2)...(x-r+\). 

Show that factorial moments are related to ordinary moments by 
the equations 

Xi) = Ai = ». K2) = /«i-». 

\ Xa) = /^-¥a + 2i, 

Al) ~ 6®, 

and so on. Similarly 

As = A<2) + ». As = A(3)+ 3A(s>+®. 

Ai = A(4)+ ¥© + %+»• 

The expression for the factorial moment about the mean is 
obtained from that defining /t^,) by replacing x by the deviation 
x—x from the mean. Relations corresponding to the above are 
obtained by dropping the dashes and putting * = 0. The student 
may then easily verify that factorial moments about the mean are 
connected with those about the origin by the equations 

Ate) = Ate)- 3/t(2)»+ 2S, 

A(4) - A(4) “ ^A®++1) A® ~ 3® (^+- *—2). 



12] Contintums Distribtitions 81 

rth moment about the mean of the distribution is 

(17) 

which is the expected value of the rth power of the deviation of x 
from the mean. The variance of x, being the second moment about 
the mean, is given by 

/^2 = J [x-E{x)Y(f>(x)dx 

or ^ 

The theorems of § 10 are equally true for continuous distributions. 
Let the values of x range from a to 6, and those of a second variate 
y from c to d. VVe may assume that the probability that x falls in the 
interval dx and, at the same time, y falls in the interval dy is jointly 
proportional to dx and dy, and expressible in the form <f>(Xyy)dxdy, 
where (p{x, y) is a continuous function of the two variates. Then the 
expected value of the sum of the variates is 


n d 

(x^y)^{Xyy\dxdy 
c ‘ * 

n d rb rd 

x^{x,y)dxdy-ir\ y<l>{x,y)dxdy. 
c ' * ^ J aj c 

In the first of these two integrals let the integration with respect to 

y be performed first. Then J ^{Xyy)dxdy is the probability that x 

will fall in the interval dx irrespective of the value of y. Denote it 

rb 

hj(l>i(x)dx. Similarly, in the second integral, <f>{Xyy)dxdy is the 

probability that y will fall in the interval dy irrespective of the value 
of x. Denote it by <f> 2 {y) dy. Then 


E{x+y) = 


rb rd 

x^i(x)dx+ y^i{y)dy 
J a J e 


= E{x) + E{,y), 


(i9) 


as required. 

In considering the expected value of the product ary we assume 
th{kt the vedates are independent. Then the probability that the 



32 


Probability 


[n 


value of X falls in the interval dx is independent of the value of 
and may be expressed as dx. Similarly, the probability that 
y falls in the interval dy is independent of the value of x, and is 
expressible in the form ^ 2 ( 2 /) ^2/- Therefore the probability that x 
falls in the interval dx and, at the same time, y falls in the interval 
dy is (l>x(x)<l> 2 (y)dxdy. Accordingly, the expected value of the 
product xy is 

n d 

xy<li^(x)(f>^{y)dxdy 

0 


rh rd 

= x(f>i{x)dx\ y^^{y)dy = E{x).E(y). 

J a J e 


Thus the expected value of the product of two independent variates 
is equal to the product of their expected values. And it follows, as 
in § 10, that the covariance of two independent variates is equal 
to zero. 


Example 1. Show that, if the variable x has uniform distribution of 
probability over the range —a to H-a, then <l>{x) = 112a, x = 0, cr^ = 
the moments of odd order about a; = 0 are zero, and that of order 2n is 
a*"/(2n+ 1). 

Example 2. Through a point B on the y-axis, whose ordinate is positive 
and equal to a, a straight line is drawn in a direction taken at random in the 
interval d = — Jrr to d = 0 being the inclination of the line to BO. Examine 
the probability distribution of the intercept x on the x-axis. 

We interpret the data as meaning that the variable 6 has uniform distribu¬ 
tion of probability in the interval — Jtt to Jtt. Hence the probability that 6 
will fall in the interval dO is 2ddliT. Now the intercept on the x-axis has the 
value X = atand, so that 0 = arctanx/a, and therefore 

dO = odx/(a®-f X*). 

But, when 0 falls in the interval dO, x falls in the corresponding interval dx. 
Hence the probability density for the distribution of x is 


<j>(x) = 


2g 

zr(a* -f- x*) ’ 


This is also the relative frequency density of the distribution in § 6, Ex. 2, 
and the properties of the distributions are the same. There is symmetry 
about X = 0, and the s.d. is 0*52a. 

1,^. Theorems of Tchebychef and Bernoulli 


Consider first a theorem due to Tchebychef. Let a; be a variate, 
either discrete or continuous, with s.d. o*. The theorem to be proved 
fixes an upper limit to the probability that a value of the variate, 



88 


18 ] BernoulWs Theorem 

chosen at random, will differ from the mean by more than A(r, 
where A is a given positive number. Tchebychef'a theorem may be 
stated: 

JIn a random choice of a value of a variate, whose standard deviation 
is O’, the probability that the value chosen will differ from the mean by 
mo. e than Act does not exceed 1 /A^. 

To prove it we observe that the variance cr^ is the second moment 
of the probability of the whole distribution about the mean. Now, 
if the combined probability P of values further than Ac* from the 
mean were greater than 1/A^, the second moment of the probability 
of these values alone would exceed (Acr)2/A^, i.e. cr^, and that of the 
whole distribution would, a fortiori, be greater than o^. Since this 
is not so the statement in the theorem must be true. Thus 

( 21 ) 

In particular the probability that a value of the variate will differ 
from the mean by more than 3cr does not exceed 1/9. The theorem 
is a very conservative one, since the actual value of P is usually 
very much less than 1 /A^. The result, however, applies to all distri¬ 
butions; and we shall now use it to prove a theorem due to 
Bernoulli. 

Let m be the number of successes obtained in n independent 
trials, in which the constant probability of the occurrence of the 
event A is p. The quotient mjn is the relative frequency of successes. 
How does this quotient behave as n increases indefinitely? It is a 
matter of common knowledge that, in trials of this nature in which 
the value of p is known, the relative frequency obtained is usually a 
close approximation to p when n is large. But there is no proof, 
based on the measure of probability given in § 7, that the relati’^ ® 
frequency of successes will have p as limiting value when n tends t^\ 
infinity. James Bernoulli, however, proved that, given any positive^ 
number e hov^er small, the probability of | mjn—p | exceeding e 
tends to zero as n tends to infinity. This is expressed in modem 
terminology by saying that mjn converges in probability to p as n 



84 Probability [ii 

tends to infinity. Bernoulli*a theorem may also be expressed in the 
form: 

Lei e and rj be two given positive numbers^ however smally and let m 
be the number of successes in n independent trialsy in which the constant 
probability of success is p. Then the probability that the inequality 


m 


<e 


( 22 ) 


will hold is greater than l—rfy provided that n is greater than a certain 
number Ny depending on e and rj, 

A simple proof may be given as follows. It was shown in § 11 that 
the mean value of the relative frequency of successes in such a 
series of trials is p, and its variance cr^ is pqjn. If then we write 
€ = A<r, the condition 


is the condition that the relative frequency of successes should 
differ from its mean by more than Act, But, by Tchebychef’s theorem, 
the probability of this does not exceed 1/A®, so that 




0-2 


M 

ne^* 


and this is less than rj for all values of n greater thanpg'/^e®. Hence 
Bernoulli’s theorem. 

The theorem does not assert that the inequality (22) must hold 
for all values of n greater than Ny but that the probability of its not 
holding is less than rj. Even this probability, however small, leaves 
room for the possibility that it may not hold on some particular 
occasion. Bernoulli’s theorem explains the practice of taking m/n, 
for a large value of n, as an approximation to’the value of p. Indeed, 
ifj frequently happens that this use of relative frequency is our only 
^means of estimating the probability of the event. 

14:^ Empirical definition of probability 

As already indicated the definition of probability in terms of 
equally likely cases does not lend itself to every instjfince in ^hich 



i4] Empirical Definition 85 

a numerical evaluation of probability is desired. Another definition 
of the probability of an event is sometimes given in terms of the 
relative frequency of the occurrence of the event in an extended 
series of trials. The fundamental assumption for such a definition 
is that this relative frequency, in a uniform series of trials, tends to 
a definite limit as the number of trials in the series tends to infinity; 
and this limit is taken as the measure of the probability ojt the 
occurrence of the event in another such trial. No proof can be given 
that the above relative frequency does tend to a limit. Convergence 
in probability, mentioned in connection with Bernoulli’s theorem, 
is a consequence of the original definition, and is not the same 
thing as the convergence to a limit assumed in the empirical 
definition. But, though the two approaches are not theoretically 
equivalent, they may be regarded as in agreement for practical 
purposes. 

By means of the assumption on which the empirical definition is 
based, the laws of addition and multiplication of probabilities can 
be deduced. Suppose, as in § 8, that a trial may result in any one of 
the mutually exclusive events In a series of n trials let be 
the number of times in which the event happens. Then the prob¬ 
ability of the happening of this event is given by 


p^ = lim 


nii 


Since the number of times in which one of the events A^, A^, ..., A* 

k 

happens is S m^, the probability of the happening of one of these 

i-i 


events is 


nii 


nii 


: hm S v = S lim-^ = S 

immlH i=.l n i-l 


and we have the theorem of total probability as in § 8. x 

Suppose next that A^d B are different events that may happeX 
as the results of specified trials T and T' respectively. We require 
the probability that, in a pair of such trials, both events will happen. 
In a series of n pairs of trials let m be the number of times in which A 
happens, and^m^ the number of times in which both A and B happen. 



86 Probability [ii 

These occasions are aD included in the m occasions on which A 
happens. Then the probability of the happening of both events is 


. p = lim — = lim 

w ^ n-+- cx 


(m mA I.. m\l mA 
\n' m) ~ 


since m tends to infinity with n, if the probability of ^ is not zero. 
Now limm/n is the probability of A, and Vimm-Jm is the con- 
ditibnal probability of B on the assumption that A has happened. 
We thus have the theorem of compound probability as in § 8. In 
the particular case of independence the probability of the occurrence 
of the two events is simply the product of the probabilities of the 
separate events. As we have already seen, the theorems of total and 
compound probability are the foundations of the mathematical 
theory. The measure of probability by relative frequency is also 
fundamental. In the a priori definition it is the relative frequency 
of favourable cases in the total number of cases, while, in the 
empirical definition, it is the limit of the relative frequency of the 
happenings of the event under consideration. 


Moment generating function and characteristic function 
Let ^[x) be the probability density in the distribution of the 
variate x. The expected value of is a function of t given by 


M{t) = J ^^(j>{x) dx, (23) 

where the integration is taken over the whole range of x. When this 
integral has a meaning for a certain range of values of t, we may 
expand the exponential and integrate term by term, thus obtaining 
the formula 

+ J (24) 

where /e'= Jaf^(a;)da;, * ^ ^ 

^eing the moment of order r a bout the origiiL JaLas 0). For this 
reason the function M{t), defined by (23), is called the mom ent 
generating fnnctiorijjax^ of the distribution about the value 
a;a 0. Similarly the m.g.f. about the value x^a ia defined 
as the expected value of expp(a;~a)]. Denoting ^his by 



16] Moment Oenerating Function 87 

we have then 

M^(t) = J exp [t{x - a)] (f>{x) dx, (25) 

provided the integral has a meaning; and the expansion of the 
exponential, and term by term integration, then show that the 
coefficient of t^jr ! is the moment of order r about the value x ^ a. 
Since the factor is independent of x, it follows from (25) that 

MS) = e-«Wo(0, (26) 

the subscript indicating the value with respect to which the m.g.f. 
is constructed. 

When the variate x takes only the discrete values x^ with prob¬ 
abilities (i = 1,2, ,..,n), the m.g.f. with respect to the value 
a; = a, being the expected value of exp [t{x—a)], is given by 

^a(0 = 2 Pi exp [«(a;< - a)] = e-“'l/o(«) (27) 

i 

as before; and the coefficient of r/r! in the expansion is the moment 
of order r about the value in question. The m.g.f. will be found useful, 
not only in calculating the moments of a distribution, but also in 
leading to concise proofs of various theorems. Along with it may 
be mentioned the cha racteris tic function* (c.f.), which is the 

expected value of where i = ^ — I and t is real. y 

Example 1. For the exponential distribution dofineci by ^(t) = in 

which c is positive and x varies from 0 to oo, the m.g.f. with respect to the 
origin is 

r<o 

M(t)~c\ exp(to—ca;)d» (|0<c)» 

jo c" ^ 

= (1 - tje)-^ = 1 + «/c + t^lc^ + ^ 

, . , ^ 1 , 21 , r! 

showmgthat Ai = "» A 2 = M’r = —r^ 

C C* cT 

Thus the mean is 1/c, and the variance is given by >. 

1 V 

and the s.D. by o* s 1/c. 

• Cf. L^vy, 1925, 3, part ii, chapters n and m; Cramer, 1937, 6, 
pp. 23-68; and j^endall, 1943, 2, pp. 90-100. 



88 


Probability 


[« 


Example 2. For the rectangular diatributiont ^{x) = l/2a, — 
we have 

1 ra siiiho^ 

Mn(0 = — e^^dx =-=- 

2aJ_a 2a^ 

= 1 + aH^/3 ! + aH*l5 ! + aH^l ! + .... 


The moments of odd order are zero, and fi^r — o^’'/(2r + 1). 

Example 3. For the binomial distribution the m.g.f. with respect to the 
origin is, in virtue of (27), 

M{t) = q^ + e^nq^’^^p + e** ^ q^-^p^ +... 

= (q+peT = (1+p«+p«V2l+P«V3i+ •••)” 

The mean, being the coefficient of t in the expansion, is np. Similarly /^a, 
being the coefficient of i*/2!, has the value 


np + 



2!p* = np[l + (n~l)p]. 


Consequently the variance is given by 

/i, = /ia - (pi)* = np{ 1 ~p) = npq. 

A very important property of the m.g.f. is expressed in the 
theorem: 

The funriifyn. nf thp. sum of two independen t 

va riates is the product of their moment generatin g functions. 

This is a direct consequence of the theorem concerning the 
expected value of the product of two independent variates. For, if 
X and y are indep endent variates, the m .g.f. of their sum with respect 
to the origin is 

. -^(e?*+vl) = E(e^ . e^y) = E{e^) E(e^y), 

and is th erefore the product of their m.g.f.'s. And, since the origin 
may be chosen at pleasure, the theorem holds for the m.g.f.*s about 
any specified value. 

Let Pr, p'j. denote the moments of x about the mean and the origin, 
^nd m' those of y. Then, by the above theorem, the m.g.f. of 
f a5+2/ an expansion obtained from the product 


(14-+ ...)(!+mie + m2«V21+ ...)• 

The coefficient of t in this product is p[ 4- wi, so that the mean of the 
sum of the variates is the sum of their means. From the coe^oient 



16] Cumulative Fwnction 89 

of ! we find that the second moment oix^-y about the origin is 
Consequently the second moment of x-\-y about 

its mean is 

- {ii[ + m[Y = (/^2 + W ~ = /^2 + ^2> 

and is thus equal to the sum of the variances of x and y. Since the 
variance of — 1 / is equal to that of y, we have again the theorem: 

The variance of the sum (or difference) of two independent variates 
is equal to the sum of their variances. 

From their definitions it is clear that the moment generating 
function, when it exists, and the characteristic function, which 
always exists, are determined by the probability density (f>(x) 
of the distribution. Conversely, the distribution is determined 
uniquely by its characteristic function,* or by the moment gene¬ 
rating function when the moments satisfy certain conditions ;t 
that is to say, variates which have the same moment generating 
function conform to the same distribution. Proofs of these state¬ 
ments are beyond the scope of this book. In the three cases in 
which we shall make use of this converse theorem (pp. 49, 57-8 
and 151) the moments of the distributions satisfy the necessary 
conditions. 

16.^ Cumulative function of a distribution 
If the logarithm of the m.g.f. of a distribution can be expanded 
as a convergent series in powers of i, viz. 

K(t) = log M{t) 

= /Ci^4-AC2tV21 + /C3^73! +...» (28) 

the coefficients, are called the cumulants (or seminvariantsX) of 
the distribution, and K{t) is the cumulative function. The cumulants 

♦ For a proof of this converse theorem see L6vy, 1926, 3, pp. 166-7; 
also Deltheil, Erreura et Moindrea Carrds, pp. 26-9 (Fasc. 2, Tome 1 of 
TraiU du Calcul dea Prohabilitia et de aea Applicationa^ ed. E. Borel), Gauthie^ J 
Villars, Paris, 1930, and Kendall, 1943, 2, pp. 90-4. ** 

t Cf. Kendall, 1943, 2, pp. 105-10. 

i We adopt Fisher’s terminology of cumulanta (1929, 1) rather than 
Thiele’s of aeminvarianta (1903,1), since Dressel has shown that the cumulants 
are only one particular set of seminvariants [Ann, Math, Stat, vol. xi, pp. 
33-67^ 1940). , 



40 Probability [n 

are determinate functions of the moments. For instance, on taking 
logarithms of both members of (24) and identifying with (28), we 
see that Ki = fi[, and is thus equal to the mean of the distribution. 
Since, by (26), the m.g.f. with respect to the mean is 

it is clear on taking logarithms that the cumulative function with 
respect to the mean differs from that with respect to a; = 0 only 
by the addition of the term Consequently the cumulative 

function relative to the mean is 


K^t^l2 ! + /C3^3/3! +... I +.... (29) 

And since this must be identical with 


log (1 + fi 2 ^ 1 + /^3 

we see, on comparing coefficients of like powers of t, that the first 
few cumulants are given in terms of the moments of the distribution 
by the formulae 

ifj = = mean, 

^2==/^2i ^4 = /^4--3/4S. (30) 

Thus all the cumulants after the first are independent of the value 
with respect to which the cumulative function is constructed. The 
mean, and the moments about the mean, may therefore be found by 
calculating the cumulative function with respect to any convenient 
origin. Also, from the last of the relations (30), we have 


A.2 




(31) 


and this is the excess of kurtosis, as defined at the close of § 6. 


Example. Since by § 15, Ex. 1, the m.g.f. of the exponential distribution, 
with respect to the origin, is 

the cumulative function ia 

/ K[i) = ~log(l-<rt) =cr^ + cr2i2/2+o^*^®/3 + .... 


Thus = <r(r~ 1)!. 

The mean, being the coefficient of t, is a*. The variance is o'*, = 2cr*, and 

/t 4 =+ = (3I + 3)(r* = 9or*. , , 



41 


16 ] Sheppard^s Adjustments 

Further, since the m.g.f. of the sum of two independent variates, 
X and y, is equal to the product of their m.g.f.’s, it follows that the 
cumulative function of the distribution of a; + y is the sum of those 
of X and y. Equating coefficients of like powers of we have the 
simple result that the rth cumulant of x-Vy is the sum of the rth 
cumulants of x and y. This is the additive property of cumulants. The 
theorem is obviously true for the sum of any finite number of inde¬ 
pendent variates. The particular case of second cumulants gives again 
the theorem on the variance of the sum of several independent variates. 

The above property of cumulants may be used to estimate the 
average corrections* to be applied to the moments of a grouped 
distribution, with specified class interval c, when the interval-mesh 
is located at random on the ungrouped distribution. The corrections 
found, which are of the same form as Sheppard's adjustments^ may 
be wrong in any individual instance, but their average effect in a 
large number of cases will be correct. In any class the mid-value 
from which the moments of the grouped distribution are calculated, 
is the sum of the true value x of the observation and the grouping 
error x^ — x. In consequence of the random location of the class 
limits, the grouping error is uniformly distributed over the range 
— Jc to Ic, The average cumulants of the grouped distribution 
will differ from those of tlie ungrouped by the cumulants of the 
grouping error. Now the m.g.f. for the uniform distribution of 
error is 

M{t) = - sinh(|cO 

and its cumulative function is therefore 

■^”^“122] i204!'''252 6!'^'"' 

If then /c' are the cumulants of the grouped distribution, those o^ 
the ungrouped distribution are 

'.-<+ 1 ^. «*«■ 

* Cf. Cornish and Fisher, 1937,7, pp. 3-4, €uid Kendall, 1943,2, pp. 74-5. 




42 Probability [n 

Denoting the moments of the grouped distribution by and m', 
we therefore have for those of the ungrouped distribution 




and /‘4 = 'f 4 + 3/ti = /ci+j|^ + 3^/c:J*-^+^j 


= — Jc*mg + 


7 ^ 

240' 


(32) 


(33) 


The equation (32) shows that the estimate in[ of the mean, and the 
estimate of the third moment about the mean, as found from the 
grouped distribution, are sufficiently accurate. The calculated 
variance should be diminished by c^ll2\ while the adjustment to 
the fourth moment is given by (33). 


COLLATERAL READING 

Uspensky, 1937, 2, pp. 1-16, 27-36, 44-8, 60-6, 161-72 and 236-41. 
Levy and Roth, 1936, 1, chapters ii-vi. 

Rietz, 1927, 2, chapters i and ii. 

Aitken, 1939, 1, chapters i and n. 

Plummeb, 1940, 1, chapters i and n. 

Kenney, 1939, 3, part ii, chapter i. 

Camp, 1934, 1, part ii, chapter i. 

PoinoabA, 1912, 1, chapters i-nr, vii and ix. 

CooLiDOE, 1925, 2, chapters i-iv and vi. 

Kendall, 1943, 2, chapter vn. 


EXAMPLES II 


Two cards are drawn at random from a well-shuffled pack of 
62. Show that the chance of drawing two aces is 1/221. 

^ The chance of throwing a 6 at least once in two throws of a 
die is 11/36. 

,i/i. A and B toss a coin alternately on the understanding that 
/the first to obtain heads wins the toss. Show that their respective 
chances bf winning are ^ and 1^3. 

^ Four persons are chosen at random from a group containing 
3 men, 2 women and 4 children. The chance that exactly two of 

them will be children is 10/21. 

« 



Examples 48 

5. From an um containing r red and s white balls, a + 6 balls are 
drawn at random without replacement (a < r, 6 ^ s). Show that the 

probability of a red and b white balls is 

^<^how that, in a single throw with two dice, the chance of 
throwing more than 7 is equal to that of throwing less than 7, 
each being 5/12. 

7. A and B take turns in throwing two dice, the first to throw 9 
being awarded the prize. Show that their chances of winning are 
in the ratio 9:8. 

8. Three men toss in succession for a prize to be given to the one 
who first obtains heads. Show that their chances of winning are 
4/7, 2/7 and 1/7. 

9. Eight coins are thrown simultaneously. Show that the chance 
of obtaining at least six heads is 37/256. 

10 . The expectation of the number of failures preceding the first 
success in an indefinite series of indej^endent trials, with constant 
probability p of success, is 

«P+2?*P+3g®P+... = (Y^3 = |- 

(Uspensky, 1937, 2, p. 178, Ex. 3.) 

11. A point P is taken at random in a hne AB, of length 2a, all 
positions of the point being equally likely. Show that the expected 
value of the area of the rectangle AP . PB is 2a^/3, and that the 
probability of the area exceeding Ja^ is l/->/2. 

12. From a point on the circumference of a circle of radius a, 
a chord is drawn in a random direction (i.e. all directions are equally 
likely). Show that the expected value of the length of the chord 
is 4a/7r, and that the variance of the length is 2 a 2 (l- 8 / 77 ^). Alco 
show that the chance is 1/3 that the length of the chord will exceed 
the length of the side of an equilateral triangle inscribed in the 
circle. 

13. A chord of a circle of radius a is drawn parallel to a given 
straight line, all distances from the centre of the circle being equally 




44 Probability [n 

likely. Show that the expected value of the length of the chord is 
\na, and that the variance of the length is — (32 — Stt*). Also show 

that the chance is 1/2 that the length of the chord will exceed the 
length of the side of an equilateral triangle inscribed in the circle. 

14. Two different digits are chosen at random from the set 

1 , 2, 3, 8. Show that the probability that the sum of the digits 

will be equal to 6 is the same as the probability that their sum will 
exceed 13, each being 1/14. Also show that the chance of both digits 
exceeding 6 is 3/28. 

15 . In Poisson’s distribution the variate takes the values 
0,1, 2, 3,... with probabilities proportional to 1, m, !, m^/3 
both sequences being infinite. Show that the mean of the distribu¬ 
tion is m, and the variance also m. 

16 . Theequation§15(26),writtenfora = 5 = /tj, is equivalent to 

®= (1— (14-/^2^-f...). 

Equating coefficients of like powers of t, deduce the relations 
§4(17), (18). 

17 . Prove that the next two formulae corresponding to § 16 (30) 
are 

= /^g- 16/i4/^2“ + 30/4|. 

18 . Two independent variates are each uniformly distributed 
within the range —a to a. Show that their sum x has a probability 
density given by 

^(x) == (2a+x)/4a* (- 2a < sc < 0), 

^{x) = (2a - x)/4a* (0 ^’x < 2a). 

/• 

Verify that the m.g.f., calculated from this value of ^(x), is equal 



19. On the aj-axis n-h 1 points are taken independently between 
the origin and a; » 1, all positions being equally likely. Show that 



Examples 45 

the probability that the (fc+ l)th of these points, counted from the 
origin, lies in the interval x — \dx to x + \dx is 



(n 4-1) X* (1 — dx. 


Verify that the integral of this expression, from x = 0 to x — 1, is 
unity. (Aitkcn, 1939, 1, p. 7,1.) 

20. Show that, for the binomial distribution. 


iC4 = ?ipg(l-6^)g). 

21. Show that the expected value of the product of the numbers 
of points showing after an unbiased throw of n ordinary dice is (7/2)’^. 

22. Show that, if p may be varied, the probability of m 
successes in a series of n independent trials, with the same prob¬ 
ability p of success, is greatest when p = min. 

23. Show that, if y and z are independent random values of a 
variate x, the expected value of (y — z)^ is twice the variance of the 
distribution of x. 

24. Defining the harmonic mean (h.m.) of a variate x as the 
reciprocal of the expected value of 1/x, show that the h.m. of the 
variate which ranges from 0 to oo with probability density xH~^ln I 
is 71, given that n is positive. 



CHAPTER m 


SOME STANDARD DISTRIBUTIONS 
Binomial and Poissonian Distributions 


17. The binomial distribution 

We have considered the binomial distribution in connection 
with the probabilities of the various nu mbers of successes in a 
series of n inde pendent trials, in each of which the chance of success 
is eq ual to p . The mean and the variance of the distribution were 
determined by means of the property that the expected value of a 
sum of variates is equal to the sum of their expected values. These 
may also be found by direct calculation. Thus 

E(x) = 0. gM-1 0 nq^~^p 4 -. . . + np^ 

gn~i _ 1) 2 4-... + p”^~^ 

== np{q~hp)^^^ = np. v/ (1) 

Thus the mean of the distribution is np. In order to f ind the varian ce 
calculate ;tot-t ke sec ond moment about a; = 0. Thus 




4- 2^. ^2 j 4-... 4- 

' - l^q'^-^p + 3 2 ^j q^~^p^ 4-... 4- J. 



Now the expression in brackets is the first moment, about the value 
a; = — 1, for the* binomial distribution in which n is replaced by 
n — 1. This first moment, being the excess of the mean of the dis¬ 
tribution above a; = — 1, is equal to (n~ l)p4-1, in virtue of (1). 
Consequently 

p'=np[(n-l)p4-l], 

and the variance of the binomial distribution is then given by 


= /tj- {E{x)\^ = np[\ 4- (w- l)p] - (wpY 

= 7ip(l-p) 

^ 


( 2 ) 





47 


17,18] Poisson's Distribution 

and the standard deviation by 

cr» ^(wpqY (3) 

The proportion or relative frequency of successes is the number of 
successes divided by n. Hence the mean value of this proportion 
is p, and its standard deviation is ^]{pq|n). 

A binomiaTlrequency distribution is one in which the relative 
frequencies of the values 0, 1, 2, n of the variate are equal to 
their probabilities in the above distribution. As an example we may 
take the distribution of expected frequencies of 0, 1, w successes 
when the set of n trials is to be made N times. For, in virtue of (1) 
and the known probability of r successes, the expected frequency of 

r successes in N sets of n trials each is N The properties 

of the distribution are the same whether it is regarded as one of 
probability or one of frequency. But the reader may note that, in 
a theoretical frequency distribution like the above, the individual 
frequencies are not necessarily integral. 

Example 1. Verify the above value of by direct algebraical simpli^ 
fication of the expression. 

Example 2, Show that the m.g.f. of the binomial distribution with respect 
to its mean is 

r ^ “I" 

(ge-'-'+pe’')" = U+pg — +pg(g‘-p*) j,+P 3 (g*+j)*)^ + ...J , 
and deduce that 

H = npq, = npq(q-p\ = npq[l + 3(n- 2)pq]. 


18. Poisson’s distribution 

An important distribution, associated with the name of Poisson, 
is one obtainable from the binomial distribution by putting^ = m/it, 
where m is a constant,and letting n increase indefinitely. Thus the 
number of trials in the series becomes very large, and the prob¬ 
ability of success in a trial very small. Now it can be shown* that,' 
on the above assumption as n tends to infinity, 


lim 



prq^ 




See Mathematical Note T, at the end of this chapter. 




48 Standard Distributions [iii 

Thus, in the limiting form of the distribution, the probability of r 
successes in the infinite series of trials is m^e~^lr\. The chances of 
0, 1, 2, 3, ... successes in the infinite series are 

... (4) 

respectively. The reader may show, as in the case of the binomial 
distribution, that the most probable number of successes is the 
integral part of m. It is obvious that the sum of the probabilities 
of 0, 1, 2,... successes is ~ 1^ as it should be. 

The mean value and the variance of Poisson’s distribution may be 
deduced from those of the binomial by putting p = min, and 
letting n tend to infinity. Thus the mean is lim = m. 

Similarly, the variance is Hm npq = hm mq = m, since q tends to 
unity as p tends to zero. These results may, of course, be deduced 
by direct calculation. Thus the mean, being the first moment about 
a; = 0, is given by 

55 = = e”^(0+l .m + 2.m72!4*3.m73!4'...) 

= me“^( 1+m+w?/2!+ m^j ^! + ...) 

= m.^ " (6) 

Similarly 

= m(m+l), 

as the reader may easily verify. Consequently 

=z [12— —m* 

= _ ( 6 ) 

as found above. The s.d. is therefore ->/m. 

The same results may be obtained by using the generating 
functions of §§16 and 16. Thus the m.g.f. of the Poissonian dis- 
^tribution with respect to the origin is 

Mq^I) = e-^(e^+ekn 4- e%i721 + e%i3/3! + ...) 

= e””*exp (meO = exp 1)], 

and the cumulative function is therei&e 


Ki^t) « m(e*—1) ss m(i + fil2 !+! + .>.). 



18 ] Poisson*s Distribution 49 


Thus /Cj s= /fg = ^8 = The mean and the variance are each 

equal to m, as is also the third moment about the mean. The fourth 

moment about the mean is r^c 

r\f ^ 

fi^ = /C 4 + 3/c| = m-h 3m*. 


jck ^ 

jP 


Further, if n independent variates conform to Poissonian 

distributions with means m^ (i = 1 , 2 , it follows from the 

theorem of § 16 that the m.g.f. of their sum is exp 


But this is the m.g.f. of a Poissonian distribution whose mean is 
2 Hence the theorem: 

i 


TAe sum of any finite number^^inde^mdent Poissonian variates 
is itself a PoissonMn varmtey with mean equal to the sum of the means . 
of the separate varidt^,* . “' ” 


A Poissonian frequency distribution is one in which the relative 
frequencies of the values 0,1, 2, 3,... of the variate are equal to the 
probabilities in the above distribution. As an example we have the 
distribution of the expected frequencies of the various numbers of 
successes when the extensive series of trials is repeated N times. 
But in such a theoretical distribution the individual frequencies 
are not necessarily integral. Frequency distributions which are 
approximately Poissonian do arise in connection with the number 
of happenings of a rare event in an extensive series of trials. 


Example, In 1,000 consecutive issues of The Utopian Seven-daily Chronicle 
the deaths of centenarians were recorded,f the number x having frequency 
/ according to the table 


x: 0 1 2 345678 

( 

/: 229 326 267 119 60 17 2 1 0 


Show that the di.<;tribution is roughly Poissonian by calculating its mean 
(m = 1*6), and then the frequencies in the Poissonian distribution with the 
same mean and the same total frequency of 1,000. The latter are approxi¬ 
mately 223 1, 334-7, 261-0, 126-6, 47-1, 14-1, 3-5, 0-8, 0-2. Also calculate the 
variance of the given distribution, and compare it with the mean. 


♦ An elementary proof of this theorem will be found in Ex. 4 at the end 
of this chapter. 

t Cf.’ Lucy W^ttaker, Biometrika, vol. x, 1914, p. 36. 


WHS 


4 





50 


Standard Distributions 


The Normal Distribution 


19. Derivation from the binomial distribution 

The binomial and Poissonian are distributions of discrete values. 
We pass now to a cont inuous distribution of fundamental import¬ 
ance. This is the norm al distribution ^ which may be derived from the 
binomial in the following manner. In the latter the probability of 

the value r of the variate is ^ deviation of 

the variate from the mean value ^ of r, so that 

r = + (7) 

Then the probability of the value x, being the same as that of the 
corresponding value of r, is 


P = ^ \ pnp+xqn 

\r)^^ \np + xr 


ft can be shown* that, for large values of n, this probability can be 
expressed , / \ 

where e tends to zero as n tends to infinity, provided that neither 
p nor q is very small, and a; is of lower order than w*/®. 

Now introduce a variable z defined by 

X 

and examine the probability distribution of z, and its limiting form 
as n tends to infinity. The probability for any interval in the range 
of 2 is equal to that of the corresponding interval for x. To unit 
interval in the range of x there corresponds the interval l/^jn in 
that of 2 ; and, when n tends to infinity, this may be denoted by dz. 
Then the probability that z falls in the interval dz is the probability 
that X falls in unit interval which includes the value x; and this is 
given by (8) which, when n tends to infinity, takes the form 




{-£,)■ 


* For the details of this step see Mathematical Note II, at the end of this 
chapter. " 



19,20] Normal Distribution 51 

The limiting form of the distribution of z is thus a continuous dis¬ 
tribution with probability density 



where = pq. This is the normal probability function, and the 
corresponding continuous distribution is the normal distribution. 
Since ^J{npq) is the s.d. of the binomial distribution, and therefore 
of the variate x, it follows that ^Jipq} is the s.d. of z. Hence cr in ( 9 ) 
is the S.D. of the normal distribution. Since ^(2) is an even function 
of 2 , the distribution is symmetrical about 2 = 0, which is therefore 
the mean, value of the distribution. Also, since 2 must lie between 
— 00 and 00, the integral of ( 9 ) between these limits is equal to 
unity, so that 

r* / 2* \ 

J_^exp^-—2j(iz = erV(2w). (10) 

This is an important integral. An independent proof of the formula 
is given in Mathematical Note III, at the end of this chapter. 

A normal frequency distribution is a continuous one, in which 
the relative frequency density/(a;) is identical with the function 0(a;) 
defined by (9). A distribution of discrete values cannot be normal. 
If, however, the total frequency is large, it is possible for the discrete 
distribution to approximate to the normal. The meaning of such an 
approximation was explained in § 6. 

In later chapters, particularly in connection with sampling theory, 
we shall meet distributions which are accurately normal, and others 
which are approximately so. Since the normal distribution is an 
ideal to which some distributions attain, and to which others 
approximate, it is important to know its properties. Fortunately, i 
the quantities associated with the distribution are easy to calculate. 

20. Some properties of the normal distribution 

As we have just seen, a continuous variate x is normally dis¬ 
tributed, with mean zero and s.d. <r, when the range of the variate 
is from —oo to oo, and the probability density is given by 

4-a 



52 Standard DistribiUiom 

The probabiKty ourve is therefore the curve 


[m 


This cur ve is symmetrical about the line a? = 0 through the mean of 
t hF'dStribu tion. It is a uni-modal curve, in which the or dinates 
dejjrease^rapidly as | x [Increases. By equating to zero the second 
derivative of y, the reader will easily verify that the points of in- 
fle non on the curve are given by a; = ± cr. T he ordinates of the 
normal curve are given in Table 1, corresponding to values of ai/cr 
at intervals of 0-01. 



Since the distribution is symmetrical, the mome nts of odd order 
a bout the mean are all zero . To find the moments of even orief^ 
observe that the momentoTorder 2n about the mean is 


Integration by parts then gives 

The expression in square brackets vanishes at both limits; and we 
may therefore write the relation % 


= (2n-l)(r2/^an-a- 



Hi b 


Normal Distribution 


58 


Table 1. Ordinates of the Normal Curve 


The origin of 2 ; is at the mean. The table gives the values of j exp 
which is cr times the ordinate of the normal curve y = ^{x) 



0*0 

01 

0*2 

0*3 

0-4 

0-5 

06 

0-7 

0-8 

0-9 

1-0 

M 

1*2 

1*3 

1-4 

1-6 

1-6 

1-7 

1-8 

1 - 9 

20 

21 

2 - 2 

2-3 

2-4 

2-6 

2-6 

2-7 

2-8 

2 - 9 

30 

31 
3*2 

3 - 3 

3-4 

3-6 

3-6 

3-7 

3*8 


000 

001 

002 

003 

004 

006 

0-06 

007 

0-08 

009 

•3989 

•3989 

•3989 

•3988 

•3986 

•3984 

•3982 

•3980 

•3977 

•3973 

•3970 

•3965 

•3961 

•3956 

•3951 

•3945 

•3939 

•3932 

•3926 

•3918 

•3910 

•3902 

•3894 

•3885 

•3876 

•3867 

•3857 

•3847 

•3836 

•3825 

•3814 

•3802 

•3790 

•3778 

•3765 

•3752 

•3739 

•3725 

•3712 

•3697 

•3683 

•3668 

•3653 

•3637 

•3621 

•3605 

•3589 

•3572 

•3655 

•3538 

■3521 

•3503 

•3485 

•3467 

•3448 

•3429 

•3410 

•3391 

•3372 

•3352 

•3332 

•3312 

•3292 

•3271 

•3251 

•3230 

•3209 

•3187 

•3166 

•3144 

•3123 

•3101 

•3079 

•3056 

•3034 

•3011 

•2989 

•2966 

•2943 

•2920 

•2897 

•2874 

•2850 

•2827 

•2803 

•2780 

•2756 

•2732 

•2709 

•2685 

■2661 

•2637 

•2613 

•2589 

•2566 

•2541 

•2516 

•2492 

•2468 

•2444 

•2420 

•2396 

•2371 

•2347 

•2323 

•2299 

•2275 

•2251 

•2227 

•2203 

•2179 

•2155 

•2131 

•2107 

•2083 

•2069 

•2036 

•2012 

•1989 

•1965 

•1942 

•1919 

•1895 

•1872 

•1849 

•1826 

•1804 

•1781 

•1758 

•1736 

•1714 

•1691 

• 1661 ) 

•1647 

•1626 

•1604 

•1582 

•1561 

•1639 

•1518 

•1497 

•1476 

•1456 

•1435 

•1416 

'1394 

•1374 

•1354 

•1334 

•1315 

•1295 

•1276 

•1257 

•1238 

•1219 

•1200 

•1182 

•1163 

•1145 

•1127 

•1109 

•1092 

•1074 

•1057 

•1040 

•1023 

•1006 

•0989 

•0973 

•0957 

•0940 

•0925 

•0909 

•0893 

•0878 

•0863 

•0848 

•0833 

•0818 

•0804 

•0790 

•0775 

•0761 

•0748 

•0734 

•0721 

•0707 

•0694 

•0681 

•0669 

•0656 

•0644 

•0632 

•0620 

•0608 

•0596 

•0584 

•0573 

•0562 

•0551 

•0540 

•0529 

•0519 1 

•0508 

•0498 

•0488 

•0478 

•0468 

•0459 

•0449 

•0440 

•0431 

•0422 i 

•0413 

•0404 

• 0.195 

•0387 

•0379 

•0371 

•0363 

•0355 

•0347 

•0339 

•0332 

•0325 

•0317 

•0310 

•0303 

•0297 

•0290 

•0283 

•0277 

•0270 

•0264 

•0258 

•0252 

•0246 

•0241 

•0235 

•0229 

•0224 

•0219 

•0213 

•0208 

•0203 

•0198 

•0194 

•0189 

•0184 

•0180 

•0175 

•0171 

•0167 

•0163 

•0158 

•0154 

•0151 

•0147 

•0143 

•0139 

•0136 

•0132 

•0129 

•0126 

•0122 

•0119 

•0116 

•0113 

•0110 

•0107 

•0104 

•0101 

•0099 

•0096 

•0093 

•0091 

•0088 

•0086 

•0084 

•0081 

•0079 

•0077 

•0075 

•0073 

•0071 

•0069 

•0067 

•0065 

•0063 

•0061 

•0060 

•0058 

•0056 

•0055 

•0053 

•0051 

•0050 

•0048 

•0047 

•0046 

•0044 

•0043 

•0042 

•0040 

•0039 

•0038 

•0037 

•0036 

•0035 

•0034 

•0033 

•0032 

•0031 

•0030 

•0029 

•0028 

•0027 

•0026 

•0025 

•0025 

•0024 

•0023 

•0022 

•0022 

•0021 

•0020 

•0020 

•0019 

•0018 

•0018 

•0017 

•0017 

•0016 

•0016 

•0015 

•0015 

•0014 

•0014 

•0013 

•0013 

•0012 

•0012 

•0012 

•0011 

•0011 

•0010 

•0010 

•0010 

•0009 

•0009 

•0009 

•0008 

•0008 

•0008 

•0008 

•0007 

•0007 

•0007 

•0007 

•0006 

•0006 

•0006 

•0006 

•0005 

•0005 

•0005 

•0005 

•0005 

•0005 

•0004 

•0004 

•0004 

•0004 

•0004 

•0004 

•0004 

•0003 

•0003 

•0003 

•0003 

•0003 

•0003 

•0003 

•0003 

•0003 

•0002 

•0002 

•0002 

•0002 

•0002 

•0002 

•0002 

•0002 

•0002 

•0002 

•0002 

•0002 

•0002 

•0001 

•0001 




54 Standard Distributioris [iii 

Repeated application of this reduction formula shows that 
/^2n ~ 1) —• 3) ... 3.1 . 

But, for any distribution, /Iq = 1, and therefore 

/^sn = (2^-l)(2>i~3)...3.1.(ra^ (13) 

In particular, the variance is cr^ and the s.d. is tr, as proved above. 
Similarly, 

= 3o^. (14) 

I The moments may also be obtained from the moment generating 

function of the normal distribution. This function, relative to the 
mean a; = 0, is 

mt). 

= exp (10*2^2), (15) 


in virtue of (10). Since the expansion of this expression involves 
only even powers of t, the moments of odd order about the mean are 
zero, as is obvious from symmetry. The moment of order 2n is the 
coefficient of t^^l{2n )! in the expansion; and this has the value 

Hn = (2w)!/w! = 1.3.6... (2n-1)0-2^ 

as found above. The cumulative function is, in virtue of (16), 

K(t) = log M(t) = ^(72^2, 

so that all the cumulants after the second are equal to zero. 

The mean deviation from the mean is easily calculated. Its value is 


/•« 1*00 
I \x\^(x)dx = 2l X<l){x)dx 

in virtae of symmetry; and this integral 

■(- 5 ) 


o^V^Jo 


zexp 




en-m 


r = 0*7979(7 = |(7 approximately. 



21] Normal Distrihylion 55 

In a normal distribution with mean x and s.d. cr, — ^ is the 
deviation of the variable from the mean, and the probability den¬ 
sity is 

Conversely, a probability density of this form defines a normal 
distribution with mean x and s.d. a, 

21. Probabilities and relative frequencies for various intervals 
The probability corresponding to any interval in the range of 
the variate is represented by the area under the curve (12) within 
that interval; and the same is true for the relative frequency in the 
case of a normal frequency distribution. In particular, the prob¬ 
ability for the interval from the mean (zero) to the value x is given 
by the integral 

Putting t = xl<r we see that this is equivalent to 

This probability is therefore a function of xfcr. In the accom¬ 
panying table the values of this integral are given for different 
values of x/o* at intervals of 0*01. 

Using Table 2 the reader will see that, if x/cr = 1, the area is 
0*3413. The area within the interval a; = —o'to a; = o'is therefore 
0*6826, and the area outside this interval 0*3174, which is less than 
1/3. In other words, the probability that a random value of a normal 
variate will deviate more than <r from the mean, is less than 1/3. 
Similarly, for x/cr = 2 the value of the integral (17) is 0*4772, and 
the area for the interval a; = — 2a' to a; = 20 * is 0*9544. The area 
outside this interval is thus 0*0456. The probability that a random 
value of X will deviate more than 20* from the mean is thus about 
4J %. Similarly, the area outside the range a; = — 2 * 50 * to a; = 2*5(r 
is (|||P124, which is about IJ % of the whole; and that outside the 
range a; = - 30* to a; = 30* is only 0*0027, or about J % of the whole. 
The reader mdy verify, and will find it convenieflit to remember, 



56 Standard DistribiUions [lii 

that the deviation from the mean which is exceeded with a prob¬ 
ability of 6% is l*96(r; and that which is exceeded with a 
probability of 1 % is 2*68cr. 


Table 2. Area under the Normal Curve 

The area is measured from th^inoan, as a 0» to any ordinate, co a 0 . 
The results are given for values of xja at intervals of 0*01 


0 

or 

000 

001 

0-02 

003 

004 

006 

006 

0-07 

0-08 

0-09 

ESI 

•0000 

•0040 

•0080 

•0120 

•0159 

•0199 

•0239 

•0279 

•0319 

•0359 


•0398 

•0438 

•0478 

•0517 

•0557 

•0596 

•0636 

•0676 

•0714 

•0763 

EQ 

•0793 

•0832 

•0871 

•0910 

•0948 


•1026 

•1064 

•U03 

•1141 


•1179 

•1654 

•1217 

•1591 

•1256 

•r828 

•1293 

•1664 

•1331 

•1700 

•1368 

•173'6 

•V406 

•1772 

•1443 

•1J@08 

•1480 

•1844 

•1617 

•1879 

0-6 

*1916 

•1950 

•1985 

•2019 

•2064 

•2088 

•2123 

•2167 

•2190 

•2224 

0-6 

•2257 

•2291 

•2324 

•2357 

•2389 

•2422 

•2454 

■2486 

•2518 

•2549 

0-7 

0-8 

•2580 

•2881 

•2611 

•2642 

•2939 

•2673 

•2967 

•2704 

•2996 

•2734 

•3023 

•2764 

•305T 

•2794 

•3078 

•2823 

•3106 

•2852 

•Sl33 

0-9 

•3169 

•3186 

•3212 

•3238 

•3264 

•3289 

•3316 

•3340 

•3365 

•3389 

1-0 

•3413 

•3438 

•3461 

•3486 

•3608 

•3531 

•3564 

•3677 

•3699 

•3621 

M 

•3643 

•3665 

•3686 

•3708 

•3729 

•3749 

•3770 

•3790 

•3810 

•3830 

1-2 

•3849 

•3869 

•3888 

•3907 

•3925 

•3944 

•3962 

•3980 

•3997 

•4016 

1-3 

•4032 

•4049 

•4066 

•4082 

•4099 

•.41.15 

•4131 

•4147 

•4162 

•4177 

1-4 

•4192 

•4207 

•4222 

•4236 

•4261 

•4265 

•4279 

•4292 

•4306 

•4319 

1-6 

•4332 

•4345 

•4357 

•4370 

•4382 

•4394 

•4406 

•4418 

•4430 

•4441 

1*6 

•4452 

•4463 

•4474 

•4485 

•4496 

•4505 

•4515 

•4525 

•4635 

•4545 

1-7 

•4554 

•4564 

•4673 

•4^ 

•4591 

•4599 

•4608 

•4616 

•4625 

•4633 

1-8 

•4641 

•4649 

•4666 

•4664 

•46IJ 

•4678 

•4686 

•4693 

•4699 

•4706 

1*9 

•4713 

•4719 

•4726 

•4732 

i -4738 

•4744 

•4760 

•4766 

•4762 

•4767 

20 

•4772 

•4778 

•4783 

•4788 

•4793 

•4798 

•4803 

•4808 

•4812 

•4817 

21 

•4821 

•4826 

•4830 

•4834 

•4838 

•4842 

•4846 

•4850 

•4864 

•4857 

22 

•4861 

•4865 

•4868 

•4871 

•4875 

•4878 

•4881 

•4884 

•4887 

•4890 

2-3 

•4893 

•4896 

•4898 

•4901 

•4904 

•4906 

•4909 

•4911 

•4913 

•4916 

24 

•4918 

•4920 

•4922 

•4925 

•4927 

•4929 

•4931 

•4932 

•4934 

•4936 

2-6 

•4938 

•4940 

•4941 

•4943 

•4946 

•4946 

•4948 

•4949 

•4*961 

•4962 

2-6 

•4953 

•4955 

•4956 

•4957 

•4959 

•4960 

•4961 

•4962 

•4963 

•4964 

2-7 

•4965 

•4966 

•4967 

•4968 

•4969 

•4970 

•4971 

•4972 

•4973 

•4974 

2-8 

•4974 

•4975 

•4976 

•4977 

•4977 

•4978 

•4979 

•4980 

•4980 

•4981 

2-9 

•4981 

•4982 

•4983 

•4983 

•4984 

•4984 

•4985 

•4985 

•4986 

•4986 

' 30 

•49865 

•4987 

•4987 

•4988 

•4988 

•4989 

•4989 

•4989 

•4990 

•4990 

3*1 

•49903 

•4991 

•4991 

•4991 

•4992 

•4992 

•4992 

•4992 

1 -4993 

1 -4993 


From Table 2 we may find the quartilea of the normal distribu¬ 
tion with mean at a; = 0. The relative frequency from the me^l to 
the upper quartile is 0*25, which is therefore the corresponding area 
under the ourva* Interpolating with the aid of Table 2 we find 
















22] Sum of Normal Variates 67 

x/cr =s 0*6745, and the upper quartile is therefore 0*6746cr. Simi¬ 
larly, the lower quartile is —0*6745cr. 


Example 1. Using Table 2 show that the 6th, 7th, 8th and 9th deciles 
are 0*263(r, 0‘625cr, 0*842a‘ and l*282o' respectively. 




! 2. For a normal distribution, with mean a; = 1 and s.D. 3, find 
the probabilities for the intervals ^ ^ ^ * 


(i) X = 3*43 to aj = 6* 19, (ii) a? = —1*43 to aj = 6*19. 

j-j^jLet x' denote the deviation from the mean. Then, in the first part of the 
example, the values of x'/a for the bounds of the interval ar^Q^l and 1'73. 
The areas from the mean to the bounds of the interval are 0*2910 and 0*4662. 

w- - - j -I / . 

The area corresponding to the interval is the difference of these areas, and 
is therefore 0*1672. \ V ' 

In the second part the values o'f x'j(r corresponding to the boimds of the 
interval are —0*81 and 1*73. The area from the lower bound to the mean is 
0*2910, and that from the mean to the upper bound is 0* 4582 ns before. 
Tfremquired area is the sum of these, viz. 0*7492. ^ 


22. Distribution of a sum of independent normal variates 

Consider the normal variate x, with mean a and variance 
Its m.g.f. with respect to the origin is 

=ss exp {at + ( 18 ) 

in virtue of (10). The cumulative function is 

K{t) = log if (0 = at + \(TH'^, 

If c is a constant, cx is normally distributed with mean ca and 
variance cV^. Hence the m.g.f. of the distribution of ca; is . 

exp {cat + \c^(xH^), 

0 

Let a;^ (i = 1, ...,w) be n independent normal variates, with means 
a^ fnd variances crj. Then, by the theorem of § 15, the m.g.f. of the 
sum '^c^x^ is ^ 


exp[<Sc,a<+J<*S'''?o-?]- 



58 Standard Distributions [iii 

But this is the m.g.f. of a normal distribution with mean and 
variance Scfcrf. Hence the theorem: 

If the independent variates (t = 1, , ^ ) nwmaUy 

mth means a^ and varianc es erf, the variate is norm ally dis~ 

tribute d vnth me an varia nce ^c\(t\, ^ 

In particular, the sum (or the difiFerence) of two independent 
normal variates is normally distributed, with variance equal to 
the sum of their variances. Also, if in the above theorem we put 

a^ = a, (r^ = a-, = 1/n (i = 1,2, ...,7i), 

we have the important result: 

If the independent variates x^{i — 1,..., n) are normally distributed 
about a common mean, a, with a common variance, cr^, their mean is 
also normally distributed about a, hut with variance cr^/n. 

Other proofs of the above theorem will be found in Ex. 8 at the end 
of this chapter. ,, 

COLLATERAL READING 

Yxjlb and Kendall, 1937, 1, chapter x. 

Kenney, 1939, 3, part i, chapter vi. 

Aitken, 1939, 1, chapter ni. 

Jones, 1924, 3, chapter xvin. 

Camp, 1934, 1, part i, chapters v and vi. 

Fisher, 1938, 2, chapter ni. 

Rider, 1939, 6, chapter v (§§30-32) 

Plummer, 1940, 1, chapter ni. 

Mills, 1938, 1, chapter xiii. 

Goulden, 1939, 2, chapter m. 

Ribtz (ed.), 1924, 1, chapter vn (to p. 103). 

Bowley, 1920, 2, part n, chapters n and m. 

CooLiDQE, 1925, 2, chapter vn. 

Levy and Roth, 1936, 1, chapter vni.* 


EXAMPLES III 

1 . Show by direct calculation that the third moment of the 
binomial distribution about a; = 0 is 

and deduce that ^ —p)- Similarly, show that 

. /t, = npg[l + 3{n-2)pq]. 



Examples 59 

2 . Show that, for Poisson’s distribution, the third and fourth 
moments about a; = 0 are given by 

^8 = m[(m +1)2 + m], -f 6m2 + 7m + 1)^ 

and deduce that ^ 

= /64 = 3m2 + m. 

Deduce these results also as limiting values of the corresponding 
moments in Ex. 1. 

3. In a normal distribution, whose mean is 2 and s.d. 3, find a 
value of the variate such that the probability of the interval from 
the mean to that value is 0*4116. {Ans. x = 6*05.) Find another 
value such that the probability for the interval from x = 3*5 to 
that value is 0*2307. (Ans, x = 6*26.) 

4. If two independent variates^ x and y, have Poissonian distribu- 
tion§ with means m^ and mg, their sum is a Poissonian variate with 
mean mi + mg. (Cf. § 18.) 

The variates take only the values 0, 1, 2, 3, .... We require the 
probability that their sum will take the value r. The probability 

that simultaneously x wiU havethe ^Tu^? anS. y the value r — sis 

' 

mf exp (— m^) ^ mS-* ex]) (— m^) ‘ ‘ . 

«! ^ (r'-s)! ■ 


Summing for all values of s from 0 to r, we see that the probability 
that x+y will have the value r is 


exp (— m^ — mg) 


a- 0 51 (^- 5 )! 


(m^ + mg)’' exp (— — mg) 

7 \ 


Consequently a; + y is a Poissonian variate with mean mj-l-mg. It 
follows that, if the independent variates x^ have Poissonian distribu¬ 
tions with means m^ (i = 1, ...,n), their sum is a Poissonian variate 
with mean 2 

5. Consider the normal distribution which has the same mean 
(3*9^5 lb.) and the same s.d. (0*46 lb.) as the distribution of yields 
of grain in Ex. I, 2. Find the relative frequencies of the normal 
distribution for the intervals corresponding to the various classes. 



60 Standard Distributions [iii 

and deduce the class frequencies per 600 of the normal distribution. 
This process is sometimes spoken of as fitting a normal distribution 
to the data. ^ 

Take, for example, the class whose limits are 4*1 and 4*3 lb. 
Since the mean is 3*95 we have, for the lower limit, xfa = 0* 16/0-46 
= 0*3261. The area from the mean to this limit is therefore 0-1278, 
andito area to the right of it is 0*3722. Similarly,, for the upper 
limit xlcr = 0-35/0-46 = 0-7609; and the area to the right of this is 
0-2233. The difference of the areas to the right of the limits is that 
corresponding to the interval. Its value is 0* 1489. This is the relative 
frequency for that interval. 

The work can be tabulated as below. Here x denotes the deviation 
of a class limit from the mean 3*95 lb. Knowing that cr = 0-46 lb. 
we can find xja for each limit. The third column gives the area under 
the normal curve, to the right of the class limit. The differences, 
recorded in the next column, are the relative frequencies for the 
various classes. These differences, multiplied by 500, give the normal 
frequencies of the cl asses per 500 of the total. The last column 
records the class frequencies in the given distribution. The lower 
limit of the lowest class is taken as — oo, and the upper limit of the 
highest class as oo, so as to include the whole of the normal distribution. 


Fitting a normal distribution to that of 600 yields of grain 


Class 

limit 

xf<r 

Area ; 
to right 

Difference 

d 

500d 

Observed 

4- 

— 00 

— 00 

1-0000 

0-0112 

6-6 

4 

2*9? 

-2'2826 

0-9888 

0-0212 

10-6 

15 

31 

-1-8479 

0-9676. 

0-0464 

23-2 

20 

3-3 

-1-4130 

0-9212 t 

0-0851 

42-6 

47 

3*5 

-0-9783 

0-8361 

0-1295 ♦ 

64-7 

63 

3*7 

-0-5435 

0-7066 

0-1633 

81-6 

78 

3-9 

-0-1087 

0-6433 

0-1712 

0-*14S9 

0-1073 

85-6 

88 

41 

0-3261 

0-7609 

0-3722 

0-2233 

74-4 

63-7 

69 

69 

4-5 

1-1956 

0-1160 

0-0644 

32-2 

35 

4*7 

1-6304 

0-0516 

0-0322 

16-1 

10 

4-9 

2-0652 

0-0194 

0-0132 

6-6 

8 

61 

00 

2-6000 

00 

0-0062 

0-0000 

0-0062 

31 

4 


6 . As in the previous example, fit a normal distribution to the 
distribution of wages of 1,000 employees given in Ejl. I, 3, showing 




Examples 61 

that the class frequencies per thousand of the normal distribution 
are approximately 6-7, 11*3, 25 0, 48*0, 79*0, 113*1, 140*5, 151*0, 
140*8, 113*5, 79*5, 48*1, 25*3, 11*5 and 6*7. 

7. Ini ,000 extensive sets of trials for an event of small probability, 
the frequenciesof the numbers of successes proved to be 

x: 0 1 234567 

/: 305 365 210 80 2.8 9 2 1 | 

Show that the mean number of successes is 1*2, and hence that the 
frequencies of the Poissonian distribution, with the same mean and 
the same total frequency, are approximately 301*2, 361*4, 216*8, 
86*7, 26*0, 6*2, 1*2, 0*2.... Verify that the variance of the given 
distribution is 1*28. 

8. Sum of independent normal variates. Another proof of the 
theorem of § 22 may be given as follows. Let x and y be the in¬ 
dependent normal variates, with a common mean zero and variances 
cr| and(j| respectively; and let = a; -f y. Then, for a fixed value of y, 
du = dx. The probability that the value of x will fall in the interval 
dx is ^(x) dx\ and therefore, for a fixed value of y in the interval dy, 
the probability that u will fall in the interval du is 



But the probability of y falling in the interval dy is 

and the compound probability of u falling in the interval du and, 
at the same time, y in the interval dy is the product dp^dp^. Iil- 
tegrating with respect to y over its range of variation, we have the 
probability that u will fall in the interval du, irrespective of the 
value of y, as 



62 Standard Distributions [iii 

In this expression the argument of the exponential may be put in 
the form 

2(o'f+ 0 * 1 ) 2(r|(r| a'|+tr|/ 

If then we change the variable of integration from y to t, where 


u(r‘. 


we have 
dp = - 




du 


27ra'iO'2 
du 


exp 


[- 


2K+o-i), 


\l[2ni<rl+crl)] 


exp 


[- 


1 


2 K+o-i)J 


in virtue of (10). Thus u is normally distributed about zero as 
mean, with variance o'|+<r|. The reader should have no diffi6ulty 
in removing the restriction of a common zero mean for the 
variates. 

The following is still another proof of the theorem. With the 
same notation let 


u ; 


(To CT-t 

CTj CTj 


Then 


2(0^2+ 0-1) ^ 




and, as the values of x and y range from — oo to oo, so do those of 
u and V, The probability that x and y fall simultaneously in the 
respective intervals dx and dy is 




2n(T^ 


so that the probability density for the joint distribution of x and y, 
i.e. the probability per unit area at the point (Xy y)y is 


27r<ri<r2 


exp 





Mathematical Notes 68 


Now the area of the element of the xy plane, bounded by the curves 
along which u has the values u and u + du respectively, and those 
along which v has the values v and v + dv, is 


d(x,y) 


d(u,v) 


dudv = 


u-f + o*§ 


dudVy 


d(x 1 /) 

where is the Jacobian of x and y with respect to u and v. 

Consequently the probability that, for a random choice of x and y, 
the representative point (a;, y) will fall in this element of area is 


1 r 

27r(crf+<r|)®*^L 2\o-5+<r|jJ 


dudv 


du 



where cr^ = orf+crl. Since this is the probability that simultaneously 
u will lie in the interval du and v in the interval dv, it follows that 
these variates are independent, and that each is normally distributed 
with zero as mean and erf+cr| as variance. 

9* Using the formulae of Ex. I, 11 show that, for the binomial 
distribution, the factorial moments about the mean are 


/^(3) =-“2n2?(7(p+l), 

and that, for the Poissonian distribution, 

/^(*) = = - 2m, m + 2). 


MATHEMATICAL NOTES 
NOTE I 

Derivation of the Poissonian distribution (see §18) 

In the binomial distribution the probability of the value r of 

the variate is We require the limiting value of this ex¬ 

pression when p = min and n tends to infinity, m being constant. 
The expression may be written 

n\ (mV / m** / m\” _ nj 

r!(n—r)!\w/ \ n) r! \ n) ^ (n —r)l^n*'(l—m/w)'* 



64 Standard Distributions [in 

The limiting value of that part which precedes the sign of multi¬ 
plication is Then, using Stirling’s formula for n!, viz. 

(the limiting value of the ratio of these two expressions, as w tends 
to infinity, being unity), we have 

_ 

^ 1. _V(27 rn)n^e-^ _ 

^{27T{n — r)} {n — 1 —mlnY 

^ y _ 1 _ 1 


since r is finite. Consequently 


lim 



-TT’ 


which is the required probability of the value a; == r in the Poissonian 
distribution with mean m. 


NOTE II 


Derivation of the normal distribution (see §19) 

To examine the behaviour of the probability 

p n ! 

(np + x )! {nq — x) I 

as n tends to infinity, replace the factorials by their values given 
by Stirling’s formula (Note I). Then an easy algebraical simpli¬ 
fication leads to 

^ " N^(27Tnpqy 

a.\n«~x+i 


where 



Mathematical Notes 


65 


Taking logarithms of both sides we may write, provided | * | is less 
than the smaller of the two quantities np and nq, 

logjf - 

-■) 

2n\p g/ *lnpq 4:n^ ^ q^j ^ 6 / 1 “ 'p^j 

4- terms with higher powers of 1 In. 

Introducing a new variable, z = xj^jn, we may write the above 


logiV' 


z 

2yln 




^n\p^ q^j Q^rixq- 


This series is convergent so long as \z \ is less than the smaller 
of the quantities py/n and q^Jn. Now let n tend to infinity. Then, 
pro^uded that neither p nor q is very small, and that | 2 [ is either 
finite or of lower order than all the terms of the above series tend 
to zero except z^l2pq; so that, when n tends to infinity, we have 


logN = j 


2p7 


or 


N = exp {z^I^2>Q)- 


Corresponding to unit increment in x we have the increment l/yjn 
in 2 , which may be denoted by dz when n tends to infinity. And, if 
we write dP for the fimiting value of P, the above formula for P 
becomes 

giving the probability of 2 falling in the interval dz\ and we have 
the required continuous distribution of 2 . 


NOTE III 

An important integral 

To evaluate the integral 



WM6 


5 



66 

let us \iTite 


Standard Distributions 


[m 


Then 


ra 1*0 

/i= exp (-»’“) (lx = exp(-«/®)cfj/, 

jo Jo 

ff-JT 

J 0 j 0 


exp { — x^ — y^)dxdy. 


If we regard x, y as Cartesian coordinates in a plane, the integration 
is extended over the area of the square OACB, bounded by a; = 0, 
a; = a, y = 0, y = a (see Fig. 4). In polar coordinates the above 
relation is 


f‘J = JJ^^P (— ^ drddy 


the integration being extended over the same squai e Since the 
integrand is positive, the integral is intermediate in value between 
the integrals over the quadrants of circles, with centre 0 and radii 
a and a ^2 respectively. Consequently /J lies in value between 

/•iff /•o fiff /*a^/2 

dO\ rexp( —r2)dr, and dO \ rexp( —r2)dr. 

Jo Jo Jo Jo 


But, as a tends to infinity, each of these integrals converges to 
Hence 


P = lim I\ = 
and therefore 

/•oo ^ 

exp(~a;2)da; = l^Jn, 

Jo 

The substitution a: = z/o" ^2 then 
gives 

and, since the integrand is an even 
function of z, 

(- ^ 2 ) = o-V(27r). 



/ 



OHAPTEE IV 


bivariate distributions. 

REGRESSION AND CORRELATION 

23. Discrete distributions. Moments 
Suppose that we have records for N marriages, giving the ages of 
bridegroom and bride, x and years respectively. Then to each 
marriage corresponds a pair of values y^) of the variables. Each 
pair may be represented by a point in the x,y plane, the co¬ 
ordinates of the point being the pair of values represented. Such a 
graphical representation is called a scatter diagram. Possibly the 
pairs of values are not all different. If the pair (x^-, y^) occurstimes, 
thenis the frequency of that pair; and, as in the case of a single 
variable, 

( 1 ) 

i«1 

n being the number of different pairs of values. The assemblage of 
pairs of values, together with their frequencies, constitutes a 
bivariate frequency distribution. Other examples of bivariate dis¬ 
tributions are furnished by the heights and the weights of a group 
of men, or by the amount of fertilizer per acre and the yield of grain 
per acre on a number of different plots of land. 

The moments of a bivariate distribution are generalizations of 
those of a univariate one. Thus the moment about the origin 
(0,0), of order r 'mx and 5 in y, is defined by 

V 

= ( 2 ) 

In particular 2 2 

= = M'oi = jf'ZfiUi=‘V ( 3 ) 

where x is the mean value of x in the distribution, and y the mean 
value of y. Similarly, 

^ s/i*? = <r| + Kt = ^ + y», (4) 



68 Bivariate Distributions [iv 

where cr* is the variance of the variable x in the distribution, or the 
moment //go about the mean (x,y)\ and likewise crj is the variance 
of y, or the moment //Qg about the mean. Lastly, we may mention 
the moment //Ji, which is given by 

The corresponding moment about the mean of the distribution 
is called the covariance* of the variables. Thus 

= An - * - y HfiXilN + ^ 

= /‘n-^F (5) 

in virtue of (1) and (3). This formula is very important, and will be 
used frequently. 

24. Continuous distributions 

A continuous bivariate distribution includes all pairs of values 
represented by points within a certain region of the xy plane, each 
pair of values occurring at least once. The number of pairs of values 
is therefore infinite. Distributions of the type ordinarily employed 
have each for relative frequency density a continuous function/(x, y), 
which is such that the relative frequency for the infinitesimal 
rectangular region, of area dxdy, and defined by 

x-\dx^x^x + ^, y-idy^y^y + ^dy, 

has the value f{Xyy)dxdy, Thus/(a;, y) is the relative frequency per 
unit area at the point {x, y), and the sum of the relative frequencies 
for all elements of the distribution is unity, so that 

jjf(pi,y)dzdy = 1, (6) 

♦ Many writers denote this quantity^JUj^^/We are, Itowever, loth to over¬ 
work the symbol p by giving it still ano^er, meaning. * 



24,25] Continuous Distributions 69 

the integration extending over the region* of the plane which repre¬ 
sents the distribution. The mean (x, y) of the distribution is given by 




For higher moments we have formulae corresponding to those of 
a discrete distribution. Thus 


/^20 = jj(x-‘x)^fix,y)dxdy 


which is equivalent to 

and similarly, o-J = /Iq 2 ”■ 


( 8 ) 

(9) 


The,covariance, being the first product moment about the mean, is 


M'li 


= jj(x-x){y-y)fix,y)dxdy 

= ji’ii - * JJ yf(o^> y) <ix:dy -yjj xf(x, y)dxdy + xy 


= Hix-xy, 


( 10 ) 


in virtue of (6) and (7). 

It will be sufficient, as a rule, to give proofs of theorems for a 
discrete distribution. The student can easily rewrite these for the 
case of a continuous distribution, replacing sums by definite 
integrals, and the relative frequencyby f{x,y)dxdy. Some of 
the steps will be indicated in Examples. 


25. Lines of regression 

It frequently happens that the scatter diagram indicates an 
association between the variables, x and y, the distribution of dots 
being denser in the neighbourhood of a certain curve, which may 
be called a curve of regression. The equation of such a curve indicates 
a functional relationship to which the association of the variables 

* By defining/(a;, y) a,s equal to zero outside this region, we may make the 
range of integration — oo to oo for each variable. , 



70 Bivariate Distributions [iv 

approximates, more or less roughly. We shall consider later the 
problem of fitting different curves to the data, so as to obtain the 
curve of a specified form which best fits the data. For the present 
we confine our attention to the straight line which is the best fit, or 
more definitely, to the problem of determining two straight lines,. 
one of which gives the closest estimate a straight line can give to 
the average value of y for each specified value of x, while the other 
gives the corresponding estimate of x for a given value of y. 
These are called the lines of regression of y on x, and of a; on ^ 
respectively. 

Consider first the line of regression of y on a?. We have to deter¬ 
mine constants a and h so that the equation 

y = a+hx ( 11 ) 

gives, for each value of x, the best estimate a linear equation can 
give for the average value of y. We interpret the term * best estimate ’ 
m accordance with the principle of least squares; that is to say, we 
find a and 6 so as to minimize the sum of the squares of the deviations 
of the actual values of y in the distribution from their estimates 
given by (11). Thus if is the point of the diagram representing the 
pair of values and is the point on the straight line (11) 



with the same abscissa the deviation of from its estimate is 
whose value is clearly yi—{a + bx^). Thus we have to choose 

a and 6 so as to make '^fi{yi—<i-‘hx^Y ^ minimum, being the 
{ 

frequency of the pair of values For a minimum value the 



26] Lines of Regression 71 

partial derivatives of the above expression with respect to a and b 
must both be zero. Hence the equations for determining these 
constants are 


= 0 , = 0 , ( 12 ) 

which are called the normal equations. In virtue of (3), (4) and (5) 
they are equivalent to 

^ y-a--hx^0 (13) 

and + + = 0. (14) 

The first of these shows that the required line passes through the 
mean (^, y) of the distribution. Also, on eliminating a between 
(13) and (14), we find 

= 0 - 


Consequently the gradient h of the line of regression of y on a; is 




(16) 


and, since the line passes through {x, 2 /), its equation maybe expressed 




(16) 


If the mean of the distribution is taken as origin, the line of regres¬ 
sion of!/ on a; is , 

y = (17) 

The gradient often called the coefficient of regression of 

y on X. 

Similarly, or by interchanging variables, we find that the line of 
regression of x on y is 

(18) 


x-x=>f^(y-y), 


and /tiil(Ty is the coefficient of regression of x on y. The product of 
the two coefficients of regression is symmetrical with respect to’a; 
and y. Its square root, is the coefficient of correlation, r. 

Thus 


having the sam© sign as the covariance 


( 19 ) 



72 


Bivariate Distributions 


[IV 


Example 1. Find the tangent of the inclination, fit of (18) to (16). 
Since the gradients of the two lines are and we have 




r <rl+o^; 


Example 2. For a continuous bivariate distribution the mean square 
deviation from the line of regression of y on a? is 


//< 


(y - a - hxYf(Xt y) dxdy. 

ht show that the normal equations are 
JJ (y-a-hx)f{xty)dxdy = 0 = JJ x(y’-a-bx)f{xty)dxdy, 
and that these are expressible in the forms (13) and (14). 


26. Goefidcient of correlation. Standard error of estimate 

The significance of the correlation coefficient, r, as a measure of 
the closeness of the association of the variables x and y, will be 
apparent from the theorem now to be considered. The equation of 
the line of regression of y on x was found by minimizing the sum of 
the squares of the deviations H^Pi. We shall now prove that the 
sum of the squares of these deviations from the line of regression of 
y on a; is equal to i\^frj(l —r^). 

Let the mean of the distribution be taken as origin, so that 
z = y = 0. Then, in virtue of (17), is equal to yi — bx^\ and the 
required sum of squares of the deviations is 

XUVi-bXi)^ = + 

= N(rl-2Nbfiii+Nb^arl 



= N(Tl{l-r% ( 20 ) 

as stated above. Denoting this sum of squares by we have 

5;j = (r*(l-r*) (21) 



26] Correlation Coefficient 78 

Since is the mean square deviation of points from the line of 
regression of y on x, Sy is called the standard error of estimate of y 
from the regression equation (16). In the same way the sum of the 
squares of the deviations of points from the line of regression of x 
on y, measured parallel to the ai-axis, is N where 

Sl = (Tl(\-r\ (23) 

and Sj, is the standard error of estimate of x from the regression 
equation (18). 

Since the sum of squares of deviation cannot be negative, it follows 
from (20) that ^ i or 

(24) 

If r = 1 or — 1, the sum of squares of deviations from either line of 
regression is zero. Consequently each deviation is zero, and all the 
points lie on both lines of regression. These two lines then coincide, 
and there is a linear functional relation between the variables x 
and y, giving perfect correlation. The nearer is to unity, the closer 
are the points to the lines of regression, and the nearer are tliese two 
lines to coincidence (cf. § 26, Ex. 1). Thus the magnitude of r may he 
taken as a measure of the degree to which the association between the 
variables approaches a linear functional relationship. The sign of r 
is the same as that of the covariance //jj, and therefore also the same 
as that of the gradients of the lines of regression. Hence r is positive 
when, on the whole, y increases with x, and negative when y decreases 
as X increases. When r is zero the variables are usually described as 
uncorrelated. 

The coefficient of correlation between two variables is also the 
coefficient of correlation between the deviations of the variables 
from their means. For the value of r depends only on and 0 *^, 

and these are functions of the above deviations. Thus 

'Efi(Xi-x){yi-y) 

and, in virtue of (3), (4) and (5), this may be expressed 

„ __,2-v 

This formula ii^ sometimes convenient. 



74 


Bivariate Distributioru 


[IV 

Example. Taking the mean as origin for a continuous distribution, show 
that the mean square deviation, from the line of regression of y on a; 
is given by 

-■=// {y-bx)^f{x,y)dxdy 
= (tJ - 2bpii -f 6VJ = <rj( I - r*). 


27. Estimates from the regression equation 
A few simple relations between the and their estimates, 7^, 
given by the regression equation (11) or (16), play an important 
part in later work. Thus 

7i = a + 6a;,, (26) 

and the normal equations (12) are expressible as 

- 0 (27) 

and Z.fiXAyi-Yi)^0. (28) 

From (27) it follows that the mean of the F's is equal to the mean 
y of the y*s. Also, on multiplying (27) and (28) by a and b respectively 
and adding, we deduce 

2/,ri(yi-r,) = o, (29) 

and therefore, in virtue of (27), 

2/.(2/i-i;)(r<-s/) = o. (30) 

This relation is very important. 

The sum of the squares of the deviations of the y’s from their 
mean may then be expressed 

i:/<( 2 /i - y? = 2/<[(y, -1;)+(I, - yW 

= S/i(y<-r,)*+I:/((F,-j))^ (31) 

since the sum of the products vanishes by (30). Now the first 

member of (31) has the value Nal. The first sum in the second 
member has been proved equal to AcrJ(l -r^). Consequently 

= (32) 

showing that the variance of the Y’s is times that of the y'a; or 
cr^r = rVJ, cTjr = I r I OTy. . (33) 



27,28] Change of Units 75 

Finally, we may show that the coefficient of correlation between 
the and their estimates is equal to | r |. Take the origin at the 
common mean of these variables, so that y = 0. Then, in virtue 

and the coefficient of correlation between the y's and the T’s is 


^ O-J- ^\r\ 

rr rr _ nr ■ * 


cr^o" 


y^Y 


cr^or 


v^Y 


(34) 


Example 1. Prove (34) as follows. Taking the mean of the distribution as 
origin, so that = hxi, cFy — | 6 | we have for the required correlation 
coefficient „ 

_ I I 

A'o-.o-k N16|tr,cr, [6| 
since r and 6 have the same sign. 

Example 2. Show that the normal equations for a continuous distribution 
(cf. § 25, Ex. 2) are expressible as 

J J ( 2 / - Y)f{x, y) dxdy = 0, J J x{y - Y)f{x, y) dxdy = 0, 

and from those deduce the relations 

J J r( 2 / - Y)f(x, y) dxdy = 0, J ^.y - F) (F - y)f(x, y) da:dy = 0 

corresponding to (29) and (30). Hence show that 


<=JJ(y- 


y)dxdy + 




y)*f{Xfy)dxdy, 


and deduce that 


//<r- 


y?f{x,y)dxdy = rVj. 


28. Change of units 

Before illustrating the above theory by a numerical example, let 
us examine the effect of a change of units on the calculation of r, 
fiy^ and 6. As in § 2, let u be the measure of the deviation of x from an 
origin a; = a, in terms of a unit c times tlie original a;-unit, so that 
a; = a + cw. Then, if a;' and u' are the deviations of the two variables 
from their means, x' = cu\ and 

• • 



7® Bivariate Distributions [iv 

Similarly, if v is the measure of the deviation of y from an origin 
y = a\ in terms of a unit c' times the original y-unit, y = a' + c'v, 
y' ss c'v* and cTy = cV^. Then the coefficient of correlation between 
X and y is 

^ __ __ covariance of u and v 

and is thus equal to the correlation between u and v. The value of r 
is therefore independent of the units employed, and is thus an 
absolute measure of correlation. But the covariance of x and y is 


H'li = = cc'HfiU'iV'ilN 

= cc' (covariance of u and v). (35) 

Similarly, the regression coefficient of y on a; is 

6 = = — (regression coefficient of v on vi). . (36) 

O’a; c 

If then c and c' are equal, the value h is the same for both pairs of 
variables. 

29. Numerical illustration 

For 1,000 marriages the ages of bridegroom and bride, x and y years 
respectively, are grouped in the table below with class interval of 5 years for 
each, the frequencies for the different classes being shown in the body of the 
table. Find the regression equations and the coefficient of correlation between 
the variables. 

Such a table is called a correlation table. The values of x and y indicated 
are the mid-values in the classes. Thus for the class in which the age of the 
bridegroom is between 26 and 30, and the age of the bride between 20 and 26, 
the values of x and y are taken as 27*6 and 22*6 respectively, and the frequency 
is 190. The data represented in any one column constitute a vertical array, 
or an array of y’s, because in each such array y assumes different values 
while x remains constant. Similarly, the data in any row constitute a hori¬ 
zontal array, or an array of a;’s. In the row prefixed the frequencies 
are given for the individual columns, and in the column headed JV, the 
frequencies for the separate rows. 

Let us take as new origin the point (27*5, 27*6), whose coordinates are 
the mid-values of the class 26 to 30 for x and y\ and, as a new unit 
for each variable, the common class interval of 6 years. Then the deviations, 
u and V, of the ages of bridegroom and bride from the new origin in terms of 
the new unit are l^hose shown in the second row and the seobnd column. The 



Ages of Bridegroom and Bride 







78 Bivariate Distributions [IV 

fourth row from the bottom of the table, prefixed gives the sum 2/m 
for each column; and these are added horizontally to give the total sum 124 
for the distribution. The next row gives the sum 2/u^ for each column ,with 
total sum 1,834 for the distribution. The row prefixed V gives for each 
column the sum 2/^?. Thus the sum — 28, in the column u = — 2, is obtained 
from 

ll(-2) + 6(-l) = -28. 

The last row gives the sum 2/wv for each column, each entry being obtained 
by multiplying the value of V above by the common value of u for that 
column. Summing horizontally all the values nV, we have the product sum 
2/mv for the whole distribution, viz. 1,109. / 

The columns to the right of the table are explained similarly. That headed 
U gives for each row the sum 2/m. Thus the first entry — 79 in the column is 
obtained from 

ll(-2) + 62(~l) + 19x0 + 3x 1 + 1x2 = -79. 

The last column, headed vC7, gives the sum 2 fuv for each row, any entry 
being obtained by multiplying the value of U to the loft by the common value 
of V for that row. Summing the entries in this colunm we find again the 
product sum 2/uv for the whole distribution, thus providing a check on the 
calculations. 

Using the values thus obtained we iiave 

M = 124/1000 = 0*124, V = -371/1000 = -0*371. 

Therefore 

X = 27*6 + 6(0*124) = 28*120, y = 27*6-6(0*371) = 26*646, 
giving the mean ages of bridegroom and bride. The variance of m is 
(T‘i = 1*834-(0*124)2 = 1*8186, 
so that or„ = 1*349 = 1*35 nearly, 

and therefore o-j. = 6*745 = 6*76 nearly. 

Similarly, we find o-J = 1*631 -(0*371)* = 1*3934, 
cr, = 1*18; OTy = 6*90. 

The coefficient of correlation between x and y is equal to that between m 
and V, Thus 

‘ _ ZuvlN-lw _ 1*109 + 0*371x0 *124 

(r„cr„ ~ 1*349x 1*180 

= 0*726 = 0*73 nearly. 

Since the units for u and v are equal, the regression coefficient of ^ on re is 
equal to that of v on m, which is 

(covariance of u and v)/o^ » 1*166/1*8186 = (^636. 



79 


80] Correlation of Ranks 

The line of regression of ?/ on a: is therefore 

2/-25-646 = 0-636(aj-2812), 
that is y = 7*79 + 0*636a;. 

The student may find similarly that the line of regression of x on is 
X = 6-86 + 0-829y. 


30. Correlation of ranks 

A group of n individuals may be arranged in order of merit or 
proficiency in the possession of a certain characteristic. The same 
group would, as a rule, give different orders for different character¬ 
istics. Considering the orders corresponding to two characteristics, 
A and JB, let be the ranks of the ith individual in A and B 
respectively. Then the coefficient of correlation between the x’s and 
the y’s is called the rank correlation coefficient in the characteristics 
A and B for that group of individuals. On the assumption that no 
two individuals are bracketed equal in either classification, each of 
the variables takes the values 1, 2, 3, ..., w; and therefore 


« = i(^+l) = y. (37) 

As a rule is not equal to y^. Let denote the difference, so that 

di^x^—Vi, (38) 

Then, if x' and y' denote the deviations of the variables from their 
means, we have also 

di — x\ — y\. (39) 


The coefficient of correlation between the variables is given by 


r = 






(40) 


To express this in terms of n and the differences, we observe that 
the variance of each of the variables x and y is — 1)/12 (cf. §3, 
Ex. 2). Therefore 

Also + 

and thus, in virtue of (41), 


(41) 



80 Bivariate Distributions 

Substitution of these values in (40) gives 


r 


1 - 




[IV 

(42) 


This is the required formula for the coefficient of correlation of 
ranks. 

If the correlation is perfect all the d’s are zero, and r = 1. If the 
orders in the two characteristics are exactly the reverse of each 
other, = KW. the points of the scatter diagram then lie 

on a straight line with negative gradient. Consequently r = — 1, 
and there is perfect inverse correlation. 

Example. The ranks of the same 16 students in Mathematics and Physics 
were as follows, two numbers within brackets denoting the ranks of the same 
student in Mathematics and Physics respectively: (1,1), (2,10), (3,3), (4,4), 
(6,6), (6,7), (7,2), (8,6), (9,8), (10,11), (11,16), (12,9), (13,14), (14,12), 
(16,16), (16,13). Calculate the rank correlation coeflficiont for proficiencies 
of this group in Mathematics and Physics. 

Here dj is the difference between the two numbers in the tth pair of brackets. 
It is easily verified that SdJ = 136, n*~n = 16 x 255. Consequently 


f 


1 - 


6 X 136 _ ^ 
16x255 “ ® 


0 - 8 . 


31. Bivariate probability distributions 

The theorems proved for a bivariate frequency distribution in 
§§ 23-28 hold equally for a probability distribution of two variates, 
relative frequency in the former case being replaced by probability 
in the latter. Thus, for a discrete distribution, if is the prob¬ 
ability of the occurrence of the pair of values (x^, yj, the moment 
about the origin is 

K$ = 'LPi^lyl, 

i 

which is the expected value of the product afy®. In particular the 
expected values of x and y are 

E{x) = E[y) = 'ZPiyt, 

while, corresponding to (4) and (6), we have 

<4=^20- [EixyW o%= Pm- [E(yW 


and 



81 ] Probability Distributions 81 


being the covariance of the variates, which is the expected value 
of the product of their deviations from their means. The coefficient 
of correlation is defined by (19), which may here be expressed in the 
alternative form 


E{x'y') ^ E(xy) 

(T,cr^ ^[E(x-^)E(y'^)y 


(19') 


x\ y* being the deviations of the variates from their expected values. 

In the case of continuous probability distributions we confine 
our attention to those in which the probability that the variates 
will fall simultaneously in the intervals dx and dy is expressible in 
the form (j>[x,y)dxdy, the probability density (l>(x,y) being con¬ 
tinuous and essentially positive. Then we have formulae corre¬ 
sponding to (6)-(10), with ^(x,y) in place of J\x,y). 

The variates are independent if the probability distribution of 
each is independent of the value assumed by the other. In particular, 
if tl^e variates are continuous, with probability densities <l>i(x) and 
^ 2 ( 2 /) respectively, the probability that x will fall in the interval dx, 
and at the same time y will fall in the interval dy, is ^i{x) dx^^(y)dy 
by the theorem of compound probability. Then the probability 
density (}){x, y) for the bivariate distribution is of the form (^i(x) (f> 2 (y)- 
Conversely, when ttiis relation holds, the continuous variates are 
independent. Now it was proved in §§ 10 and 12 that the covariance 
of two independent variates is equal to zero. It follows from the 
above definition of r that the coefficient of correlation of two independent 
variates is equal to zero. The converse of this theorem, however, is 
not necessarily true; that is to say, uncorrelated variables are not 
necessarily independent. 

The moment generating function of the bivariate distribution is 
defined as the expected value of the function exp (tiX + t 2 y), where 
ti and ^2 inde|)endent of x and y. Thus, in the case of a con¬ 
tinuous distribution, the m.g.f. with respect to the origin is 


M{t^, t 



exp (t^x + t^y) (/)(x, y) dxdy. 


When this integral has a meaning the exponential may be expanded 
in powers of ti and /gJ and the coefficient of t\tyr\s \ is the moment 
Prg about the .origin. In Ex. 3 at the end of Chapter v we shall find 

j I 

WMS 6 



82 Bivariate Distributions [iv 

this function for the bivariate normal distribution, and use it to 
calculate the moments. 

When the variates x and y are independent we have 

= jE;[exp {hx + t^y)] = ^[exp . ^[exp {t^y)] 

= (function of tj) x (function of ^g)- 

And conversely, when the m.g.f. is of this form, the variates are 
independent. 


32. Variance of a sum of variates 

Consider the variates x and y, with standard deviations and 
(Ty respectively. Their distributions determine a bivariate prob¬ 
ability distribution for the two variates, with correlation coefficient 
r given by (19'); and each value that may be taken by their sum 

u = x + y (43) 

has a definite probability. Then, in virtue of § 10 (10), 

E{u)^E{x)^E{y), (44) 

and therefore, if x\ y\ u* are the deviations of the variates from 
their means, we obtain from (43) and (44) by subtraction 


u* — x' + y\ 

Consequently = x'^ + + 2x'y\ 


and, on taking the expected value of each member we deduce, in 

virtue of (19'), « o « 

(rl = <rli’Orl + 2r(r^(ry, (45) 


This is the required formula for the variance of the sum of the 
variates. And, since — r is the correlation between x and it 
follows that the variance of the difference 


^ v^x—y 

is given by - 2r<T^cry. (46) 

If r is zero, as in the case of independence of x and y, we have the 

simple result « « « « 

= = (47) 

already proved in §§ 10, 16 and 16. 



32] 


Variance of a Sum 


88 


More generally, let be a linear function of the variates x,y,z ,..., 

so that . L . . 

u = ax-\-by + cZ’\-...f (48) 


the constants a,b,c, ... being either positive or negative. Then 
£J{u) = aE{x^ + bE{y^ H- cE{z^ + .. • 
and therefore by subtraction 

u' = ax' + %' + C2' -f..., 

the primes indicating that the variates are measured from their 
means. On squaring both sides, and taking expected values, we 
obtain the required formula 

crl = aV* + 6V5 + cV| +... + 2ahr^(r^(Tj^ +..., (49) 

in which is the coefficient of correlation between x and y. 


COLLATERAL READING 

Ezekiel, 1930, 2, chapters m-v and viii-ix. 
Yule and Kendall, 1937, 1, chapter xi. 
Kenney, 1939, 3, part i, chapter viii. 

Camp, 1934, 1, part i, chapters viii and ix, 
Jones, 1924, 3, chapters x and xi. 

Rietz, 1927, 2, pp. 77-88. 

Rietz (ed.), 1924, 1, chapter viii (to p. 129). 
Rider, 1939, 6, §§18, 19,'22-24. 

Bowley, 1920, 2, part ii, chapters vi and vii. 
Plummer, 1940, 1, chapter v. 

Mills, 1938, 1, chapter x. 

Goulden, 1939, 2, chapters vi and vn. 
Snedecor, 1938, 3, chapters vi and vxr. 
Tippett, 1931, 2, chapter vii. 


EXAMPLES IV 

1 . Show that, if a and 6 are constants and r is the correlation 
between x and y, then the correlation between ax and by is equal to 
r if the signs of a and b are alike, and to — r if they are different. 

Also show that, if the constants a, 6, c are positive, the correlation 
between {ax + by) and cy is equal to 

^ (ar<r^ + b(Ty)l^{a^a% + 6 V 5 + 'hihrcr^v^). 


6-2 



84 Bivariate Distributions [iv 

2. The variables x and y are connected by the equation 
aa; + -f c = 0. Show that the correlation between them is — 1 if 
the signs of a and 6 are alike, and +1 if they are different. 

^ 3. Show that, if x\ y' are the deviations of the variables from 
their means, 

and r = -1 -f ^ s + y\\(T^)\ 

and deduce that — 1 <r < 1. (Rietz, 1927, 2, p. 84.) 

4. The variates x and y have zero means, the same variance 
and zero correlation. Show that 

(x cos a + 2 / sin a) and {x sin a — y cos a) 
have the same variance a** and zero correlation. 

5. Weighted mean vnth minimum variance. Let x^ (i = l,...yn) 
be n independent variates with variances (Tf. If the variates are 
given weights w^, their weighted mean is 

i i t 

where 

i 

We can show that the variance of this weighted mean is least when 
the weights are inversely proportional to variances erf. In virtue 
of (49) the variance of the weighted mean is 

i__ i J 

For a minimum o^he^^rtial derivatives of this with respect to the 
must be zero. This requires 

- &M) (Sw’i) = 0, 

so that Wfcrl = (S«’|<r?)/(2w<), 

showing that is the same for all values of i. Thus the weights 
of the variates are inversely proportional to their variances. 

Show that this minimum variance is equal to Hjn, where H is 
the harmonic mean of the variances crj. , 



Examples 85 

6 . For a given bivariate distribution find the straight line for 
which the sum of the squares of the normal deviations is a minimum. 

Let the straight line be a: cos a + 1 /sin a = p. Then we have to 
minimize the sum of squares cos a + 2 /^ sin a—by equating 

to zero its partial derivatives with respect to p and a. Show, from 
the first of the equations thus obtained, that the required line passes 
through the mean of the distribution. Then, taking the mean as 
origin, show from the second equation that a is given by 

tan 2a = -g. 

O’" — O’" 

X y 

Of the two directions at right angles, found from this equation, one 
makes the sum of squares a minimum, and the other a maximum, 
for lines through the mean of the distribution. (L. J. Reed, Metron, 
vol. I, 1921, part 3, pp. 54-61.) 

7. ^he ranks of the same 15 students in Mathematics and Latin 
were as follows, the two numbers within brackets denoting the ranks 
of the same student: (1,10), (2,7), (3,2), (4,6), (5,4), (6,8), (7,3), 
(8,1), (9,11), (10,15), (11,9), (12,5), (13,14), (14,12), (15,13). 
Show that the rank correlation coefficient is 0-51. 

8 . The marks, x and y, gained by 1,000 students for theory and 
laboratory work respectively, are grouped with common class 


\ ® 
y \ 

42 

47 

62 

67 

62 

67 

72 

77 

82 

Totals 

52 

3 

9 

19 

4 

_ 

_ 

_ 

_ 

_ 

35 

57 

9 

26 

37 

25 

6 

— 

— 

— 

— 

103 

62 

10 

38 

74 

45 

19 

6 

— 

— 

— 

192 

67 

4 

20 

69 

96 

64 

23 

7 

— 

— 

263 

72 

— 

4 

30 

64 

74 

43 1 

9 

— 

— 

214 

77 

_ 

— 

7 

18 

31 

60 

19 

6 

— 

130 

82 

— 

— 

— 

2 

6 

13 

16 

8 

3 

46 

87 

— 

— 

— 

— 

— 

2 

5 

8 

2 

17 

Totals 

26 

_1 

97 

226 

244 

189 * 

137 

65 

21 

6 

1,000 


interval of 6 marks for each variable, the frequencies for the various 
classes being shown in the correlation table above.,The values of 




86 Bivariate Distributions [iv 

X and y indicated are the mid-values of the classes. Show that the 
coefficient of correlation is 0*68, and the regression equation of 
y on X 

y = 29-7 + 0-656:r. 

9. The logarithm of the m.g.f., M{t^, t^), of § 31 is the cumulative 
function K{tiy and the cumulant is the coefficient of t\t\jT\s\ 
in the expansion of K{t^, t^) in powers of and Show that 

^11 == M'lV ^21 ~ /^21> ^31 = /^31 ” ^22 == /^22 “/^20/^02 ~ 

10. By means of the identity 

and the fact that u — v is constant along each of one set of the 
diagonal lines of a correlation table, the sum of products may 
be calculated without difficulty. Tabulate the quantities m — r, 
{u — v)*, / and f(u — v)^ for each diagonal line of the corre*lation 
table on p. 77, and deduce that 11^7. Using the 

values found for S fu^ and S verify that 2 = 1109. 

Deduce the same result from the identity 

using the other set of diagonal lines. 



CHAPTER V 


FURTHER CORRELATION THEORY. 
CURVED REGRESSION LINES 

33. Arrays. Linear regression 

In the numerical illustration of § 29 all the values of x in any one 
vertical array were regarded as equal to the mid-value of x for that 
interval. Since the actual values of x vary over a range of 6 years, 
the results obtained cannot be regarded as more than a good approxi¬ 
mation. A better approximation could be obtained by choosing a 
smaller class interval; but that would make the numerical work 
correspondingly heavier. For accurate work the class interval must 
be so small that all the values of a; in a vertical array are either 
exacl^ly or very nearly equal; and similarly all the y's in a horizontal 
array. The theoretical proof of certain formulae, however, is not 
made any more difficult by a choice of arrays sufficiently numerous 
to satisfy the above conditions; for our sums are just as easy to 
manage whether the number of arrays is large or small. And, if 
we are dealing with a continuous distribution, we may take in¬ 
finitesimal class intervals, dx and dy, and replace our sums by 
definite integrals. We assume then that, in the ith vertical array, all 
the x’s have the same value, x^. 

It will be convenient to extend our subscript notation as follows. 
Though the a;’s in the ith vertical array all have the same value, the 
y’s are different. A typical pair of values in the array is {x^^y^j), 
represented by the point and we shall denote its frequency by/^^. 
Thus the first subscript, i, indicates the vertical array, while the 
second, j, indicates the position in that array. Denoting the total 
frequency in the ith array by we have 

% = 2/«. (1) 

2 indicating summation within that array. The mean value of y 
i 

in this array will be denoted by y^. Thus 

* ^ 


(2) 



88 Correlation Theory [v 

The mean of the array is represented by the point 0^, whose co¬ 
ordinates are y^), is the point with abscissa on the line of 

regression of y on x. The mean, (3, of the distribution also lies on 
this line of regression, and has coordinates (jc, y) given by 

Nx^Y. I^fijXi = S UiXi, Ny^j: Z/aVij = S (3) 

i i i i j i 



The summation over the whole distribution is thus divided into 

two summations; for 2 denotes summation for terms in the same 
i 

vertical array, and then 2 indicates summation of the result over 

i 

all the arrays. 

It is worth noting that the same equation is obtained for the line 
of regression of y on a;, if each value of y in any array is replaced by 
the mean value of y in that array. For the equation of this regression 
line depends only on x, y, (r\ and And, since the above change 
does not affect the a:*s, x and (r% are unaltered by it. So also is y, 
in virtue of the second equation (3). As for we see that 

=S 'LfijXiVij-Nxy =S niXiyi - N^. 
i i < 

by (3), and is therefore unaltered by the above change of y'&. 
Consequently, the regression line of y on a; is unaltered. It follows 
that, if the means of the vertical arrays are collinear, their line must 
coincide with the line of regression of y on x. This regression is then 
said to be linear» Similarly, the regression of x on y is said to be 



84] Correlation Ratios 89 

linear if the means of the horizontal arrays all lie on the line of 
regression of x on y. It is possible for either regression to be linear, 
or both. 

It was shown in § 26 that = (rj(l — r^) gives the mean square 
deviation of points from the line of regression of y on x. If this 
regression is linear the means of the vertical arrays lie on this line 
of regression; and, if further the vertical arrays all have the same 
variance, this common variance must be The s.d. of each array 
is then (Tyyl(l — r^), When the vertical arrays all have the same 
variance, the regression of i/ on x is said to be homoscedastic, 

34. Correlation ratios 

Consider next the deviation GiP^j of the point from the mean 
of the vertical array in which it lies. The sum of the squares of 
these deviations for the whole distribution is denoted by NS'y^ 
so tjiat 

(4) 

Then, by analogy with § 20 (22), the correlation ratio* rjy, of t/ on x 
is defined by 

(5) 

rjy being regarded as positive. Thus 

( 6 ) 

which corresponds to §26(21). From (5) it is clear that 
Also it is easy to show that ify ^ r^. This is evident from a comparison 
of (5) with §26(22), since the sum of the squares of the 

deviations in any array being least when they are measured from 
the mean of the array. Thus 

(7) 

When the regression of y on x is linear, the straight line of means of 
arrays coincides with the line of regression, and rj\ is then equal 
to r^. A non-zero value of J is thus associated with a departure 
of the regression from linearity. 


• ♦ Some writers denote this ‘ratio’ by 



90 Correlation Theory [v 

It is clear from (6) that the more nearly r/l approaches unity, the 
smaller is and therefore the closer are the points to the curve of 
means of the vertical arrays. If = 1, then = 0, so that all the 
deviations are zero, and all the points lie on the curve of means. 
There is then a functional relation between x and y. We may there¬ 
fore describe the correlation ratio rjy as a measure of the degree to 
which the association between the variables approaches a functional 
relationship expressible in the form y = F(x), where F{x) is a single¬ 
valued function of x. This should be compared with the corre¬ 
sponding interpretation of r in § 26. 

A convenient expression for rjy can be found, involving the 
s.D. (Ty^y of the means of the vertical arrays, each mean being weighted 
with the frequency n^ of that array. This s.D. is given by 

= (8) 

i 

Now 

N(rl=l.'Lfniyn-y)" 

i j 

= ^ nftiVyn - Vi)+(Vi - v)V‘ 

i j 

= Zfijiyn -Vd^+Z «((</< - 2/)*. 

i j i 

the sum of products being zero, since 'ZfniVij — Vi) vanishes for each 

array. The first sum on the right is NS'fy in virtue of (4), and 
the second is Ncr^^y, Consequently the equation is equivalent to 

(rl^S'y^ + aiy, (9) 

Thus the variance of the y’s of the distribution is expressible as the 
sum of two parts, of which the first is the variance within the 
arrays, and the second the variance of the weighted means of the 
arrays. Also, comparing (9) with (6), we see that 

vl = O’mi/K. 

and therefore Tfy = 

The correlation ratio r/y is therefore the ratio of the s.D. of the 
weighted means of the arrays of y’s to the s.d. of all the y’s of the 
distribution. 



85] Correlation Ratios 91 

In the same way, by considering horizontal arrays in each of 
which the value of y is constant, we may define the correlation ratio 
Tjy. of X on y. For, if Oj is the mean of the jth horizontal array, with 
abscissa x^y and NS'^ denotes the sum of the squares of the devia¬ 
tions of the points, each from the mean of the horizontal array in 
which it lies, r)\ is given by 

( 11 ) 

and, as before, we have the relations 

( 12 ) 

and Vx = (13) 

where or,^3. is the s.d. of the means of the horizontal arrays, each 
mean being weighted with the frequency of that array. 


35.^ Calculation of correlation ratios 
Formula ( 10 ), giving the value of 7 J, is clearly equivalent to 

= S Wi(.v< - y?INal. (14) 

Now the sum in the numerator may be evaluated without cal¬ 
culating the deviations - y. For 


But 


S - 2 /)® = S '/i - 2?/ s + ^2/®- 

i i i 

niVi = 'LhiVu = Ti, 


where is the sum of the y’s in the ith vertical array. Hence 

S = 'LTi = Ny=T, (15) 


where T is the sum of the y’s for the whole distribution. We may 
therefore write 


/TT2 7^2 rp2 


772 772 

N" N~ 


and (14) becomes 




( 16 ) 



92 


Correlation Theory [v 

Since the deviation from the mean is independent of the choice of 
origin, while numerator and denominator of (14) are altered in the 
same ratio by change of unit, the value of 17 J calculated from (16) 
is unaffected by change of origin and unit. 

Example, Calculate the correlation ratios for the distribution of ages of 
bridegroom and bride given in § 29. 

Working in terms of v and the 5-year unit we have 

T = - 371, JV = 1,000. T^jN = 137-64. 


The values of T,- are given in the row prefixed F. Hence 

l7_(-28)« (-339)» .(28 )’_r,„o, 

~ +“3T2“+-+“7“ - 

ml = 1393-4. 


Consequently 




876-81-137-64 739-17 


1393-4 


1393-4 


= 0-5305, 


and = 0-729 nearly. 

This is only slightly greater than the value 0-726 found for r. 

The reader should verify, in the same way, that rjl = 0-5504, so that 
Tfj, = 0*742. The departure of the regressions from linearity is only slight. 


36. Other relations 

We shall now prove that the mean square deviation of the 
weighted means of the vertical arrays from the line of regression 
of 2 / on a; is equal to (rl{Tfy - r^), that is to say 

= (17) 

where Y^ is the estimate of from the regression equation § 25 (16). 
For, subtraction of ( 6 ) from § 26 ( 21 ) shows that 

Nal{r,%~T^)=^N(Sl-S’^) 

= S S/« {{{yti - Vi)+{Vi - T^i)Y - iVi] - Vi)^ 

i 

the sum of products vanishing, since Tifijiyij — Vi) is zero for each 
array. Thus (17) has been established. The difference-T/J-r^ is zero 



36,37] Continuous Distributions 93 

only when each of the deviations —3^ is zero. In that case the 
means of arrays all lie on the line of regression, and the regression 
of y on a: is linear. We thus see again that a non-zero value of yj - 
is associated with departure of the regression from linearity. 

We may prove, for use in a later chapter, that the sum of squares 
2 ^ yf iJaay be resolved into two separate sums. Thus 

i 

S - y)* = E - y)f 

i i 

= Eni(yi-^;)“ + Eni(^^-P)^ (18) 

i i 

the sum of products vanishing, since 

'Lni{yi-Y,){Y,-y) = = 0 

i i j 

in virtue of § 27 (30). The sum in the first member of (18) is 
or NtjIctI by (10). The first sum on the right of (18) has just been 
proved* equal to —r^); and the final sum has the value 

WrVJ, in virtue of §27 (32). The equation (18) thus corresponds to 
the identity 

NtjIgtI = W(7j(//J-r2) + iVrV2. (19) 


37. Continuous distributions 

The preceding proofs may be adapted to a continuous distribution 
by an appropriate change of notation, and a replacement of sums by 
definite integrals. The vertical array, whose abscissa is x, we assume 
to be of infinitesimal breadth dx. Then, if f(Xy y) is the relative 
frequency density, the relative frequency for this array is 

J [/{a:, y) dx] dy = /i(x) dx, (20) 

where /i(x) = J f{x, y)dy, (21). 

the integration with respect to y extending over the whole of that 
array. The mean y^. of the y’s in the array is given by 

yJ-SP^)=‘jyfix,y)dy, 


( 22 ) 



94 Correlation Theory [v 

and this is the equation of the curve of means of the vertical arrays. 
The mean of the whole distribution has coordinates 


xf(x,y)dxdy « J xfi(x)dx, 

yf{x, y)dxdy = j yJi{x) dx, 

integration with respect to x including all the vertical arrays. The 
above relations correspond to the equations (1), (2) and (3) for a 
discrete distribution. 

The mean square deviation of the values of y, from the means 
of the arrays in which they lie, is given by 

s? = JJ(2/ - !'»)*/(*. y) dxdy, (24) 

and rjy is defined as before by an equation of the form (6) or (6). 
The variance of the weighted means of the vertical arrays is 

<v = - yfM^) ( 25 ) 

and, as before, this may be expressed in terms of erj and 8'^, since 
= jj{y-Wf(x>y)dxdy 

= JJ [(y - Vx)+iVx - y)] V(a:. y) <^<iy 

= (26) 

the integral of the product being equal to zero; and from this it 
follows, as in the case of a discrete distribution, that 




Vy ~ ^myl^v 

Formulae corresponding to those of § 30 may be similarly esta> 
blished. For, by subtraction of (^4) from the result in the Example 
of §26, 

= JJ[(y- i")®-(y-ya)*]/(*.y)«**‘*y 

“ J [[{(y - y*)+(y» -- (y - y»)*]/(*, y) ^dy 
■= ff(y»-J^)V(*.y)«^‘^y = f(yx-Wi(a:)<i«, 

•-*' . f 



88] Normal Distribution 95 

and this is the mean square deviation of the weighted means of 
arrays from the line of regression of y on x. And, corresponding to 
(18), we have the resolution 

= \m.- Y)+{Y-m^h(x)dx 
= J(2/* - YfJi(x) dx+j{Y- y)^fi(x) dx, 
the various integrals corresponding to terms of the identity 


38. Bivariate normal distribution 

We shall now consider briefly the continuous bivariate distribu¬ 
tion, which is a generalization of the normal distribution discussed 
in Chapter iii. It may be introduced simply as follows.* Assume 
first that the variable x is normally distributed with s.d. 0*1. Then, 
if the variable is measured from its mean, the probability that a 
random value of x will fall in the interval dx is 



Assume next that the regression of y on x is linear and homo- 
scedastic. Then, if cTg is the s.d. of y in the distribution, the common 
variance of the arrays of y’s is cr|(l — where p is thef coefficient 
of correlation between the variables. Finally, assume that each 
array of y’s is normally distributed. Then, since the mean of each 
array is on the line of regression 

1 /= p.TO'a/o'i, (27) 

and the variance of each array is as stated, the probability that a 
value of y, taken at random in an assigned vertical array, will fall . 
in the interval dy is 

* Cf. Rietz, 1927, 2, pp. 104-7. 

t In the theory of sampling, r refers to the sample, emd (Tj, cTj, p to the 
population from 'which the sample is drawn. ^ 



96 Correlation Theory [v 

By the theorem of compound probability, the chance of a pair of 
values (x, y) falling in the elementary rectangle dxdy is 


dP^dP^ = 


dxdy 

27roriO-iiV(l-p^)®^P 


[ 


1 x^ 2pxy 

2(1-^^) K^O'i! ^ O’sl J ■ 


The probability density (l){x,y) for the distribution is therefore 


ij>{x,y) = 


27ro-i<r2V(l-i0“) 


exp 


r_.1_ (28\ 

L 2'( 1 - p2) (o-f o-iO-g o-il J ■ 



Such a distribution is called a bivariate normal distribution, and the 
variables are said to be normally'correlated. The surface z = <f){x,y) 
is the normal correlation surface. Since (28) is of the same form in x 
as in y, we may conclude that the regression oixony is also linear, 
the variance of each array of x*s being (t\(\ — p^). The values of x 
in each such array are normally distributed, with mean on the line 
of regression of x on y, whose equation is 




( 29 ) 



89] Intraclass Correlation 97 

The m.g.f. for the distribution is found below,* and from it the first 
few moments are calculated. 

The nature of the normal correlation surface is indicated in the 
diagram. The curves along which the probability density (or the 
relative frequency density) is constant are the homothotic ellipses 


^pxy ?/* 
a\ (T^cr^ <xl 


(30) 


With respect to these the line of regression of y on a; is conjugate to 
the y-axis. For the locus of the mid-points of chords x = const, is 
the straight line (27). Similarly the line of regression of a; on t/ is 
conjugate to the aj-axis.f 


39. Intraclass correlation 

Let us now consider the correlation between the measures of 
some common characteristic for pairs of members of the same family 
or class. For example, we may be interested in the correlation 
between the weights of brothers, or the heights of sisters. The 
relation between two members of the same family is a reciprocal 
one; for, if P belongs to the same family as Q, then Q belongs to the 
same family as P. Each pair of members, P and Q, will therefore 
contribute two entries to the correlation table. In one of them x 
will be the measure of the characteristic for P, and y the measure 
for Q ; in the other, x wiU be the measure for Q, and y for P. The table 
will thus be symmetrical. 

We shall consider only the case in which each of the h families 
has the same number, k, of members. Then there are A;(A;~ 1) pairs 
of values for each family, and 

N^hk(k--l) (31) 

gives the total number of pairs of values in the table. Let x^^ denote 
the measure of the characteristic for the jth member of the ith 
family. Thus the first subscript indicates the family, and the 
second the particular member of that family. Consequently i takes 
the values 1, 2,..., A, &ndj the values 1, 2,..., A;. The cc’s and the y’s 

♦ See Ex. V, 3. 

t For further properties of this distribution see Exx. 4, 6 and 6 at the 
end of this chapter. 


WMS 


7 



98 


Correlation Theory 


[V 


of the correlation table are the hh values in different orders. Any 
one value occurring as an a; in the table, will have as its y each 
of the other fc — 1 values for the same family. Thus each value x^^ 
occurs A; — 1 times as an x\ and the mean x for the bivariate dis¬ 
tribution is therefore 




.(32) 


and, since the table is symmetrical, y has the same value. The 
variance (t% of the ic’s is the same as the variance of the hk values x^p 
since each of these occurs the same number of times in the dis¬ 
tribution. Thus 

= (33) 

and cr’l has the same value, in virtue of symmetry. We may denote 
this common variance by cr^, so that 

(r2 = cr| = (rj. (34) 

The coefficient of intraclass correlationy r, is given by the usual 
formula, which in this case is equivalent to 


hk{k -1) o-V = S 2 S (% - x) (Xii - x) (35) 

i ) I 

(j,l = l,2j,,,, k] j i = 1,2,..., A). 

The sum is a triple one; for the product (x^j — x) (x^i — x) is the pro¬ 
duct of the deviations from the mean for the jth and Ith. members 
of the ith family, and the sum must include all such products for 
different values oij and and for all the families. We carry out first 
the summation with respect to i, observing that I takes all integral 
values from 1 to A; except the value j. Thus the sum of the terms 
x^ includes all values for the ^th family except x^jy and may therefore 
be written — x^p where x^ is the mean for the ith family. The sum 
of the terms x for the A; — 1 values of i is (Aj — 1) x. The triple sum in 
(35) may therefore be written as the double sum 


S S i^n - ~ 1) a?] 

*^22 ■“ ~ 2 S “■ 

i i ii 



40 ] Polynomial Regression 99 

Now by (33) the last sum on the right is equal to hkcr^. To evaluate 
the other we first carry out the summation with respect to j. Thus 


= k^{xi-x){kxi-kx) 

= 2 (x^ - ^)2 = k^hal^, 

i 

where is the variance of the means of the families. 
Substituting these values in (35) we obtain 

hk{k—l)ah = hk^o^-hka^. 


The coefficient of intraclass correlation is therefore given by the 
formula 


ko%,- <T 
(k— 1 ) 0 - 2 * 


• Curved Regression Lines 

40. Polynomial regression. Normal equations 

We have seen that the line of regression of y on a: gives the best 
representation of the behaviour of y with change of x, that can be 
given by a straight line, the term ‘ best ^ indicating that the sum of 
the squares of the deviations (or ‘residuals’) is less for this straight 
lino than for any other. But it is often apparent from the data that 
the regression oiy on x is far from linear; and it may be desirable to 
find an equation of regression which affords a better representation 
of the behaviour of y than can be given by a straight line. The 
simplest type of non-linear regression equation is that in which 
one of the variables is expressed as a polynomial in the other. 
Polynomial regression of y on a; is therefore represented by an equa¬ 
tion of the form 

y ^hQ + b^x + b^x^-^ (37) 

in which the coefficients b^ are constants. Our problem is to deter¬ 
mine these constants so that the sum of the squares of the residuals 
is a minimum. The choice of the degree, fc, of the polynomial is at 
our disposal. If there are n different pairs of values in the distribu¬ 
tion, we can make the curve pass through all the representative 



100 Polynomial Regression [v 

points by choosing But, to keep the arithmetical work 

reasonably simple, k must be fairly small, say 2,3, or 4. The distribu¬ 
tion of points in the scatter diagram will frequently suggest the 
shape of the curve of regression, and thus a suitable value for k. 
Having decided on the value of k, we determine the coefficients 
6, by the method of least squares. 

With the notation of § 23 suppose there are n different pairs of 
values, /< being the frequency of the pair (a;^, y^), and the total 
frequency N being given by 

(38) 

i-» 1 


Let Hi be the point on the curve (37) with abscissa x^. Then its 
ordinate has the value 

Yi = bo+biXi+...+bkX^ = S b^xl, (39) 

and the deviation of the point from the curve of regression is 

Hi Pi == yi — Yi. (40) 

The sum of the squares of the deviations is 

NS^^^fi{yi^Yi)\ ( 41 ) 


We have to choose the coefficients 6^ so that this sum is a minimum; 
and this is done by equating to zero the partial derivatives of 8^ 
with respect to these coefficients. We thus obtain the A; +1 normal 
equations 


= 0 
i 


( 42 ) 


(»= 1,2.. a = 0,1,2, ...,fc). 

Written separately, and in terms of the values of the given dis¬ 
tribution, these are 


2/i(y< - 6o - 6j a;* -... - 6*0;?) = 0, 

i 

’ZfMVi-h^-b^Xi -= 0 , 

< 


( 42 ') 


Hfi^iyi-bo-b^Xi- ...-btx'f) = 0. 




40 ] Curve Fitting 101 

The coefficients bg are determined from these ifc +1 equations, which 
involve the sums of powers of from to and sums of 

I 

products from 2 to 2 For the particular case A; « 1 these 
equations are the same as § 25 (12). 

When the values x^ correspond to equal increments, A, and the 
distribution of the frequencies is symmetrical about x, the 
equations (42') may be much simplified. Suppose first that n is odd 
and equal to 2 m + 1 . Then, by taking the origin of x at the middle 
value of that variable, and the common increment h as unit of 
measurement, we have for the values of x in the distribution 

-m, -(m- 1 ), - 1 , 0 , 1 , (m- 1 ), m. 

Hence, owing to the symmetry of the distribution of/’s, the sums 
of the odd powers of x are all zero. The sums of the even powers 
may be written down from tables or algebraical formulae. If, 
however, n is even and equal to 2 m, we take the origin of x at the 
mean of the middle pair of values, and as the new unit. The 
values of x then become 

-(2m-l), ..., -3, -1, 1, 3, ..., (2m-l), 

and the sums of the odd powers of x vanish as before. Moreover, 
the equations (42') then consist of two groups, one involving 
6 q, 62 ,... and the other 6^, 63, .... Tlieir solution is thus simpUfied. 

Exaynple. Fit a parabolic curve of regression of y on a; to the seven pairs 
of values 

XI 1-0 1-5 20 2-5 3 0 3-5 4-0 

yi 1*1 1*3 1-6 20 2-7 3-4 4 1 

The dot diagram of the seven pairs of values suggests a parabola. Since n 
is odd, and the values of x corrosy)ond to equal increments 0*6, we take the 
origin at the middle value 2-5 with 0-6 as the new unit. This is equivalent to 
the transformation 

u = 2a;—6. 

Each frequency /< is unity. The sums of odd powers of u are zero, and the 
calculation may be arranged as in che accompanying table. The normal 
equations are 

16-2 = 760 + 286,, 14-3 = 286i, 69-9 = 286o+1966„ 

which give immediatelv 

• 6, = 2-07, 6, = 0-611, 6, = 0061. 



102 Polynondal Regression [v 

The regreasion equation is therefore 

y = 207 + 0-61 Im + 0*061m* 

= 207 + 0-611(2a;-6) + 0061(2a;-6)*, 
which simplifies to y = 1*04— 0*20x + 0-24x*. 


X 

u 

y 


u* 

uy 

U^y 

1-0 

-3 

M 

9 

81 

-3 3 

9-9 

1-6 

-2 

1-3 

4 

16 

-2-6 

6*2 

20 

-1 

1-6 

1 

1 

-1-6 

1-6 

2-6 

0 

20 

0 

0 

— 

— 

3*0 

1 

2-7 

1 

1 

2-7 

2-7 

3*5 

2 

3*4 

4 

16 

6-8 

13-6 

40 

3 

41 

9 

81 

12-3 

36-9 

Totals 

— 

16-2 

28 

106 

14-3 

69-9 


41. Index of correlation 

When considering the line of regression of y on x, we saw that 
I f I is the coefficient of correlation between the values and their 
estimates found from the regression equation, and tha£ this 
coefficient is connected with the mean square deviation from the 
line of regression by the formula = crJ(l-r 2 j propose to 
show that a corresponding relation holds for polynomial regression. 
Multiplying the normal equations (42') by 6o, 6i, respectively 
and adding we obtain, in virtue of (39), 

S/,r,(2/,-7,) = 0. (43) 

i 

Consequently (41) is equivalent to 

= Hfiyiiyi-Y,) = (44) 

i i i 

From the first of the normal equations, follows 

i 

that the y’s and the F’s have the same mean. If this is tf ken as 
origin for both variables, their variances are given by 

Ncrl^Y^hyl 

i i 

and the coefficient of correlation, i2, between the and the 7^ by 

\ ^ S/,r? = a^r 

in virtue of (43), so that 


Bay — ay. 


( 46 ) 




108 


41] Index of Correlation 

Consequently (44) is equivalent to 

= ( 46 ) 

which is analogous to § 26 (21). 

The coefficient R is thus an indication of the closeness with which 
the points of the scatter diagram approximate to the regression 
curve (37). If jR = 1, then 8“^ = 0, and all the points lie on the 
curve. For this reason R is often referred to as the index of corre- 
lation for the regression curve (37). With each curve of regression 
there is associated such an index, given by 

(47) 

where 8^ is the mean square deviation of the points from the 
curve of regression. In virtue of (43) and (44) N8^ in the case 
of polynomial regression is given by 

'ifiKViA- (48) 

• i s 0 i = 1 

Further, the relations (30) and (31) of §27 hold also for poly¬ 
nomial regression. For, in virtue of (43) and the first of the normal 
equations, we have 

= (49) 

i 

Then the sum of the squares of the deviations of the y’s from their 
mean may be expressed 

'Lfiiyi - y)^ = UMiyi - Yi)+(Yi - y)? 

= i:fi(yi-Y,r + ^fi(Yt-y)\ (50) 

in consequence of (49). This equation corresponds to the identity 
N(Tl = Ncrl(l--R^) + NR^al 
as is evident from (45) and (46). 

Example. Prove, as in § 36, that R^) is the mean square deviation 

of the weighted means of the vertical arrays from the curve of regression. 
Also that, if is the frequency in the tth vertical array, 

2n<(y..-3/)* = Sni(y,-F<)* + Sni(y<-y)*, 

corresponding to the identity 

. NrileP, = NaliTil - i?*) + 



104 


Polynomial Regression [v 

42. Some related regressions 

A plotting of the values of x against the logarithms of the corre¬ 
sponding values of y may indicate an approximation to a linear 
relation between x and logy. In such a case we find a regression 
equation of the form 

= (51) 

where c and a are constants. For this relation is equivalent to 
logy = logc4-a;loga. 

If then we find the line of regression of log y on Xy the constants of 
the equation determine both c and a. 

Similarly, if the plotting of log a; against log y indicates that the 
relation between these quantities approximates to linear, we find 
a regression equation of the form 

y = cofiy , (52) 

which is equivalent to 

logy = logc + 61oga;. 

We have therefore to find the line of regression of logy on \ogx, and 
the constants of the equation determine c and b. 

We might also find a regression equation of the form 

y s= ( 63 ) 

where f(x) is a polynomial in x. For this is equivalent to 
logy = logc4-(loga)/(a:), 

and therefore requires a polynomial regression of log y on x. 

More generally, the principle of least squares may be employed 
to find a regression equation of the form* 

y = 61 X 14 - 62 X 24 -... 

where Xj,...» Xp are any functions of the independent variable x. 
The argument used in § 40 leads to the normal equations 

2 /iXi(y<-r,) = o. .... xfiX^iVi-Yi) = 0 . 

i i 

* Cf. Fisher, Annals of Eugenics, vol. ek, part 3, p. 238 (1939). 



42] Related Regressions 105 

for the determination of the coefficients 6^. Multiplying these 
equations by respectively and adding, we find 

= 0 , 

i 

so that the sum of squares of the deviations from the regression 
cui’ve is 

as in § 41. If one of the (piantities X, is taken as unity (or a constant), 
the mean of the r/’s is equal to that of the F’s. The argument of § 41 
then applies to the present case, leading to (45), (46), (49) and (50). 


COLLATERAL READING 

Yule and Kendall, 1037, 1, chapters xii, xin and xv-xviii. 
Kenney, 1!)39, 3, part i, chapter vii; part ii, chapter iv. 
Ezekiel, 1030, 2, chapters vi and vii. 

Aitken, 1030, 1, chapters v and v£. 

Rideu, 1030, 0, §§20, 21 and 25 -28. 

Rietz (ed.), 1024, 1, pp. 120-38. 

Jones, 1024, 3, chapter xix. 

Camp, 1034, 1, part i, chaf)tor x. 

Mills, 1038, I, chapter xii. 

Plummer, 1040, 1, chapter v. 

Rietz, 1927, 2, §§32, 37 and 38. 

Goulden, 1930, 2, chapter xiv. 

CooLiDOE, 1925, 2, chapters vm and ix, 

Tippett, 1931, 2, chapter ix. 

Harris, 1913, 3. 


EXAMPLES V 

1 . The heights of brothers, in five families of three each, are 
as follows: (67, 68, 69), (68 ,68, 71), (68, 70, 72), (70, 70, 73) and 
(71, 72, 73) inches. Show that the mean heights for the families 
are 68, 69, 70, 71, 72 inches, and the general mean x = 70. Also 
cr^ = 18/5, orj, = 2, and the coefficient of intraclass correlation of 
heights for the five families is 1/3. 

2. Verify that, for the distribution of Ex. IV, 8, the correlation 
ratios are =» 0-695 and tj^ = 0-685 approximately. 



106 


Correlation Theory [v 

3. Moments and m,g.f. for the bivariate normal distribution. The 
moments may be deduced from the m.g.f. as defined in §31. To 
calculate this function let and cr^ be the s.d.’s of x and y in the 
distribution, and p the correlation between these variates. Then 
the m.g.f. is given by 




where 


c = 


2n(ri<T^^(l-p^y 

By transforming the exponential we may write this: 


Carrying out the integration with respect to x, and rearranging the 
exponential, we have 

= exp [|(a-|<J + 2 p(rierg<i «2 + tr|<|)], 
which is the required m.g.f. 

To calculate the various moments we have only to expand this 
function in powers of t^ and The expansion is 


1 + \(&{ t\ + 2pcr^(T^ t^ t^ + cr| tl) + l{cr\ t\ + 2pcr^cr^ t^ t^ +cr| <|)2/2! + .... 

Since is the coefficient of t\i\Jr\s \ in this expansion we have 

p20 ^ ^i> P'll ~ P^1^2f P'02 “ ^2> P$1 ~ ^P^^29 
/^22 = (1 + 2/^2) or? 0*1, /ti3 = Spoiol = 3(7?, Pq^ = 3crJ, 

and so on. 

4. Show that the area of the ellipse, § 38 (30), of constant prob¬ 
ability density is nX^o-ioJ^il —p^)\ and hence that the area of the 



Examples 107 

strip between the ellipses corresponding to the parameter values 
A and A + dA is Deduce the probability 

exp [ — A^/2( 1 — p2)] A dA/( 1 — p^) 

that a pair of values (x, y) chosen at random, will be represented by 
a point inside this strip; and hence by integration the probability 
that the point will fall inside the ellipse A is 

If this probability is then A^ = 1 * 3863(1 
Also show that, for a given value of dA, the probability that the 
point {x,y) will fall in the strip between the ellipses A and A + dA is 
a maximum when A^ = 1 — p^. This determines the ‘ ellipse of 
maximum probability*. (Cf. Rietz, 1927, 2, pp. 108-10.) 

5. The variates x and y are normally correlated, and g, y are 
defined by 

^ = X cos ^ + y sin y — y cos ^ — a; sin d. 

Show that 5, y will be uncorrelated if 

tan 2^ = '^pcT^crJicrl-crl). 

The above transformation corresponds to a rotation of rectangular 
axes of coordinates; and 0 determines the directions of the principal 
axes of the ellipses of constant density. 

Show that, if y are thus uncorrelated, and cr|, are their 
variances, then 

cr^(r^ = 1 o’l+cr^ = (rl + (7|. 

6 . Show that, if g, y are independent normal variates, and x, y 
are defined by 

X = gcos^ + ^sin^, y = 17 cos ^ - g sin 
the coefficient of correlation between x and y is given by (cf. Ex. 6 ) 

((7| - (r“)2 sin2 W + 40*1 erj * 

which is numerically greatest when 6 — ± Jtt. The extreme values 

of rare ± (o-f ~crJ)/(cr|+o’J). 



108 


Correlation Theory [v 

Also show that the points of inflexion of sections of the normal 
correlation surface, by planes through the 2 -axis, lie on the elliptic 
cylinder + 7i^cr\ = or| crj. (Cf. Yule, 1897, 1.) 

7. The profits, %, of a certain company in the a:th year of its 
’life are given by 

a: 1 2 3 4 6 

y\ 1250 1400 1650 1950 2300 

Taking u — a;--3 and v — (i/-1650)/50, show that the parabolic 
regression oiv on u is 

v +0*086 = 6•30t^ + 0•643t^^ 

and deduce that the parabolic regression of y on a: is 

1/ = 1140 + 72* 14x + 32* 14a;2. 


8. Let X, y be normally correlated variates with zero means as 
in §38. Writing 


show that 


w = 


O'!/ 

d(w, z) _ _ 1 _ 


and 




1 /x^ 2pxy 

\-p^\cr\ (r^<T^ cry 


Deduce that the joint probability differential of w and z is 


dP = ~ exp [ — i{w^ + 2 ®)] dwdz, 
Ztt 


and hence that Wy z are independent normal variates, with zero 
means and unit s.d.’s. In other words w and z are independent 
standard normal variates. 



CHAPTER VI 


THEORY OF SIMPLE SAMPLING 

43. Random sampling from a population 

In order to examine a large population with respect to a specified 
characteristic, the statistician chooses a sample of individuals from 
that population and, from the properties of the sample relating to 
the given characteristic, he endeavours to estimate those of the 
population. Suppose that the characteristic considered is the height 
of the individual. Then the assemblage of heights of all the individuals 
in the population is called a population or universe of heights, and 
those of the individuals in the sample is a sample of heights from that 
population. Similarly, we might consider populations of weights, 
wages, yields of grain, etc. In the same way, if our consideration is 
the percentage of male births in a very large population of births, 
we may be obliged, in estimating this percentage, to confine our 
attention to the data provided by a sample of such births. The theory 
of sampling is concerned, first, with estimating the properties of the 
population from those of the sample, and secondly, with gauging 
the precision of the estimates, i.e. with ascertaining the deviations 
from the true values that may be expected in the estimates obtained. 

Fundamental to the theory is the concept of random sampling. 
This is defined by the property that, in the selection of an individual 
from the population, each member of the population has the same 
chance of being chosen.* Statisticians have developed techniques 
for ensuring, as far as possible, that their sampling is random; but 
the reader who wishes to study the details of these techniques must 
consult other works, f 

• 

Sampling of Attributes 

44. Simple sampling of attributes 

In the sampling of attributes, as distinct from the sampling of 
values of a variable such as height, we are concerned only with the 

* See also Kendall, 1941, 2. 
t Yule and Kendall, 1037, 1, pp. 336-46. 



110 Theory of Sampling [vi 

possession or non-possession of some specified attribute or character¬ 
istic by the individual selected in sampling. For instance, in sampling 
from births we may be concerned only whether the baby is male or 
not. In sampling from a population of men our consideration may 
be whether they are smokers or non-smokers. The choosing of an 
individual in sampling may be called a ‘trial’, and the possession 
of the specified attribute by the individual selected a ‘success’. 
Simple sampling is random sampling with the further provision that 
the probability p of success is the same at each trial. Thus p is a 
constant in the process; and the probability of success at any trial 
is independent of the success or failure of preceding trials. The value 
of p is the relative frequency of the occurrence of the attribute in 
the population from which the sample is drawn. Hence, for the 
sampling to be simple, either the population must be very large, or 
the individual selected must be iteturned to th^ population before 
the next trial, successor failure having been noted.*^ 

The problem connected with the drawing of a simple sample of 
n members is thus identical with that of a series of n independent 
trials, with constant probability p of success; and the results of 
§§11 and 17 are applicable. The probabilities of 0, 1, 2, ... successes 
in a simple sample of n members are thus the terms of the binomial 
expansion of The binomial probability distribution thus 

determined is called the sampling distribution of the number of 
successes in the sample. The expected value, or mean value, of the 
number of successes is therefore np\ the variance is wpg, and the 
standard deviation is ^J{npq). This s.d. is usually called the standard 
error (s.b.) of the number of successes in a sample of size n, the 
deviation from the expected value np being looked upon as ‘error’. 
The proportion of successes in a sample is obtained by dividing the 
number of successes by n. The expected value of the proportion of 
successes is therefore p; and the s.d. of the proportion of successes 
is 1/n times that of the number of successes, i.e. Ajipq/n), Thus 

S.B. of the number of successes = ^Jinpq), 

S.B. of the proportion of successes = ^lipq/n)* 

The precision of the proportion of successes observed in the sample 
is regarded as Inversely proportional to the s.b. of this proportion. 



45] Sampling of Attributes 111 

Hence the precision of the observed proportion varies as In 
particular, to double the precision it is necessary to increase the 
size of the sample four-fold. 

45. Large samples. Test of significance 

As we have just seen, the sampling distributions of the number 
and the proportion of successes in a simple sample of size n are 
binomial distributions. It was also shown in Chapter m that, for 
large values of w, the binomial distribution approximates to a 
normal distribution, in the sense that the probabilities for corre¬ 
sponding intervals in the two distributions tend to equality as n 
increases indefinitely. Now we know that the probability that a 
random value of a normal variate, of s.d. cr, will lie outside the 
interval which extends So* on each side of the mean is only 0-0027. 
Similarly, the probabihty that the value will deviate from the 
mean by more than 2cr is 0-0456, or about 4i %. We may therefore 
conclude that, for large values of n, the probability that the number 
of successes in a simple sample of n members will differ from the 
mean by more than three times the s.b. is also very small; and that 
a deviation of more than twice the s.E. is rather unusual. 

Bearing this in mind we have a test of the credibility of the 
hypothesis that a given large sample, of n members, was obtained 
by simple sampling from a population in which the relative frequency 
of the occurrence of the attribute considered is p. Suppose it is 
found that the number of successes in the sample differs from the 
expected value np by more than 3^(npq), Then an event has hap¬ 
pened which, on the hypothesis of simple sampling, is very improb¬ 
able. We conclude then that the truth of tliis hypothesis is itself 
very improbable, and we say that the difference is highly significant. 
Considerations of the aspects of the problem will then lead us to 
suspect either that the value of p employed is incorrect, or else that 
the conditions of simple sampling were not observed. A deviation 
from the mean less than twice the s.E. is regarded as not significant. 
For deviations greater than twice the s.E. the significance increases 
with the deviation. The dividing line between significance and non¬ 
significance is, of course, not sharply defined. But significance is 
usually regarded as beginning where the probability of a larger 



112 Theory of Sampling [vi 

deviation is less than 6 %. It may be remarked that, while the 
above test may furnish evidence against the hypothesis, it cannot 
prove the hypothesis to be correct. The most it can do in its favour 
is to provide no evidence against it. 

The above argument still holds if, in place of the number of 
successes and its s.e., we employ the proportion of successes in the 
sample and its s.e. The expected value of this proportion in simple 
sampling is p ; and we compare the deviation of the actual proportion, 
from this value, with the s.e., of tliis proportion. In some 

cases the value of p in the population is not known, but must be 
estimated from the sample. The estimate obtained from a large 
sample may be used without serious error in place of the true value, 
since the s.e. of the proportion of successes is small when n is large. 

Example , A certain cubical die was thrown 9,000 times, and a 5 or a 6 was 
obtained 3,240 times. On the assumption of random throwing, do the data 
indicate an unbiased die? 

On the hypothesis of an unbiased die the chance of throwing a 5 or a 6 is 
1/3. Thusp = 1/3 and q = 2/3. The expected number of successes is therefore 
3,000, and the deviation of the actual number from this value is 240. The 
S.E., e, of the number of successes is 

c = ^( npq ) = V(9000 x J x |) = lO-v/20 = 44‘72. 

The deviation 240 is nearly 6’4 times this s.b.; and it is therefore most un¬ 
likely to appear a.s a result of simple sampling with p = 1/3, We therefore 
conclude that the die is almost certainly biased, and that p is not equal to 1/3. 

The estimate of p obtained from the sample is 3,240/9,000 = 0-36. The 
8.B. of the proportion of successes is then 

e' = V(0-36 X 0-64/9,000) = 0-0060, 

and 3e' = 0-016. It is therefore most unlikely that the true value of p lies 
outside the ramge 0-36 ± 0 016. In other words, the true value of p almost 
certainly lies between 0-346 and 0-376, 

46. Comparison of large samples 

Let two populations, and A. be tested for the prevalence of a 
certain attribute, by taking from them large simple samples of 
and Tij members respectively; and let p^ and pg be the observed 
proportions of successes in the samples. Is the difference, Pi—P 2 > 
significant of a real difference between the two populations with 
respect to the given attribute! On the hypothesis tfaat the popula- 



46] Sampling of Attributes 118 

tions are similar in this respect, we may combine the samples to 
estimate the common value of the relative frequency of the occur¬ 
rence of the attribute in the populations. This estimate is then 

The s.B.’s of the proportions of successes in samples of and 
members are ^Ji'pqjny) and V(mM 2 ) respectively; and, since the 
samples are independent, the variance e* of the difference of these 
proportions is given by 

in consequence of the theorem proved in §§ 10 and 15. 

On the assumption that the populations are similar, the expected 
value of the did'erence Pi^p^ is zero, for 


c -P 2 ) = - ^iPi) = 0. 

The sampling distributions of p^ and ^>2 ^re approximately normal, 
when Ui and are large; and the same is true of their difference 
since the samples are independent. Thus the distribution of p^—p^ 
is approximately normal, with mean zero and s.d. e. The probability 
that, in simple sampling, the difference p\-“P^ will be numerically 
greater than Se is therefore very small. The probability that it will 
be greater than 2e is in the neighbourhood of 5 %; and any value 
smaller than this is regarded as not significant, i.e. as providing no 
evidence against the hypothesis. 

Example 1. In a simple sample of 600 men from a certain large city, 400 
are found to be smokers. In one of 900 from another large city, 450 are 
smokers. Do the data indicate that the cities are significantly different with 
respect to the prevalence of smoking among men? 

Here Pi = 2/3, pg = 1/2, so that pj —pj = 1/^. On the assumption that the 
cities are alike with respect to the prevalence of smoking among men, we have 
as our estimate of the common value of p, 

_ ^0 _ 17 

im)o" 30’ 

and the variance of the difference of the proportions for the two samples is 

^ = =0-000682. 
j \yi^ *1-2/ 30 30 \600 900/ 


wics 


8 



114 Theory of Sampling [vi 

Hence e *= 0*026. The observed difference is greater than 6e, and is therefore 
highly significant. Our assumption that the populations are similar is 
therefore almost certainly wrong. 


Next suppose that simple samples, of rii and members, are 
drawn from populations in which the proportions are pi and pg 
respectively {Pi>P 2 )» Is it likely that the proportions p[ andpj in 
the samples will be such that p[—P 2 ^^0; in other words, is the real 
difference between the populations likely to be hidden in sampling? 
As before, the distribution of p[—p 2 is approximately normal for 
large values of Ui and but the mean of the distribution is now 
Pi—P 2 » variance, being the sum of the variances of p[ and 

pj, is given by 

(3) 


^2 


n. 


In order that the sample value of p[—p^ should be negative, its 
deviation from the mean must be on the negative side^ and numeri¬ 
cally greater than Pi—P 2 . If Pi'“ 2 > 2 > 3e, the probability of this is 
very small. If, however, Pi—P 2 is much less than 2e, such an event 
would not be very unusual. For the particular case in which 
Pi—P 2 == 2e, the probability of the event is in or near the interval 
2 -2i%. 


Example 2. In two large populations there are 35 and 30 % of fair-haired 
people. Is the difference likely to be revealed by simple samples of 1,600 
and 1,000 respectively from the two populations? 

Here p^iR 0-35 and pg = 0-30, so that p^ —pg = 0*06. The variance of the 
difference of the proportions in the samples is 


e* = 


(0-35) (0-65) (0*3) (0-7) 


1600 


1000 


= 0*000362, 


so that e = 0*019. The difference Pi—Pg is about 2*6e. The probability that 
the real difference between the populations will be hidden is approximately 
the probability that, for a random value of a normal variate, the deviation 
from the mean will be on the negative side and greater than 2*6 times the 
S.D. Since this is less than ^ %, it is unlikely that the difference will bo hidden. 


47. Poissonian and Lexian sampling. Samples of varying size 

Consider next a few modifications of the condition of simple 
sampling, beginning with Poisson's series of trials (cf. § 11, Ex. 3). 
Suppose that, in drawing a sample of attributes of n members, the 
chance of success changes at each drawing. Letp^ be4;he probability 



47] Sampling of Attributes 115 

of success at the ith drawing. Then the expected value of the 
number of successes in the sample is the sum of the expectations 
at the individual drawings; and this is 


liPi = 

where p is the mean of the quantities Also, since the drawings 
are independent, the variance of the number of successes in the 
sample is the sum of the variances of the numbers of successes at 
the separate drawings, so that 


If (T^ is the variance of the quantities wo may write this 

-np — n (= npq — no^. (4) 

Thus the variance of the number of successes is less than when the 
probability remains constant and equal to p. Dividing by we 
have the variance of the proportion of successes in a Poissonian 
sample 

(5) 
n 


Consider next a Lexian series of trials,"^ Suppose that, in taking 
N simple samples of attributes of n members each, the probability 
of success varies from one sample to another. Let p^ be the value 
in the ith sample (i = 1,2, We wish to find the s.e. of the 

number of successes per sample, when the records of all the samples 
are pooled. Let p be the mean value of the probability^ so that 
Np = S Pi- Then the expected value of the number of successes in 
the whole series of samples is equal to the sum of the expected values 
for the individual samples, and tliis is 

■c* 

J^npi = nNp, 

Hence the expected value of the number of successes per sample is 
np. To find the variance of the number of successes per sample, 
we observe that the mean square deviation from np in the ith 
sample is ^ - np)*. 

Summing for all the N samples, and equating to We*, we have 
We* = n^Piq^-{‘n^Yi[Pi-pY• 

• ♦ Studieci by W. Lexis in 1877, 



116 Theory of Sampling [vi 

Now the last term has the value where 0 % is the variance of 

the quantities In the other sum we may write 

= Npq — N(T^. 

Substituting these values in the equation, and dividing by N, we 

= npq + n{n -1) crj, (6) 

which is the required variance of the number of successes per sample. 
Dividing by we have the variance of the proportion of successes 
per sample, viz. , 

n n ^ 

Both the variances (6) and (7) are greater than in the case of simple 
sampling with constant probability p. 

Lastly, we may consider the modification of simple sampling in 
which the probability p of success remains constant, but the size n 
of the sample varies about a mean n with variance cr^. To find the 
variance of the number of successes per sample, we observe first 
that the mean number of successes per sample is rip. Then, for 
samples of size n, the mean square deviation of the number of 
successes from np is npq. Consequently the mean square deviation 
from the general mean np is 

npq-}-(np — np)^. 

The expected value of this mean square is the required variance of 
the number of successes, so that 

= ( 8 ) 

We shall make use of this result in a later chapter. 

Sampling of Values of a Variable 

48*. Random and simple sampling 

We pass now to the consideration of sampling of values of a 
variable and measurable quantity, such as height, age, yield of 
grain, etc. Each member of the population of individuals, objects 
or experiments provides a value of the variable; and we thus have 
a population of values of the variable, and the frequency distribu- 



48,49] Sampling of Variables 117 

tion determined by it. In drawing a sample of n members from the 
population we are choosing n values of the variable from those of 
the distribution. 

We have already defined random sampling as sampling in which 
each member of the population has the same chance of being chosen. 
In the case of a population of discrete values of the variable x, the 
value may occur f times. There are then members of the popula¬ 
tion each equal to x^. Hence the probability of the value x^ in the 
selection of an individual by random 8am])ling is fJN, which is the 
relative frequency of that value in the population. Similarly, if the 
population of values of x has a continuous distribution with relative 
frequency density/{a:), the probability that, in the random selection 
of an individual, the value of the variable will fall in the interval 
dx is f(x)dx. Simple sampling is random sampling with the further 
provision that, in the selection of an individual from the population, 
the probability of obtaining a value of the variable within any specified 
range remains constant throughout the sampling. In particular, with 
a population of discrete values of the variable, the probability of ob¬ 
taining a specified value x^ remains constant during the sampling. 
Thus, in simple sampling, the system of probabilities associated with 
any drawing is independent of the results of [)receding drawings. 

A population, whose distribution is continuous, contains an 
infinite number of values in any finite interval in the range of the 
variable. In drawing a finite random sample from such a population, 
the probability associated with any interval remains unchanged, 
and the sampling is therefore simple. Thus a finite random sample 
from a population whose distribution is continuous is a simple sam pie. 
It is the common practice to refer to such a sample as a ‘random 
sample *. If, however, the population contains only a limited number 
of values, the sampling will not be simple unless each value selected 
is returned to the population before the drawing of the next value, 

4^ Sampling distributions. Standard errors 
jThe distribution of the variable in the population has its mean, 
variance, moments of higher order, partition values, etc., which 
are spoken of generally as the parameters of the population. Simi¬ 
larly, each simple sample from the population determines a fre- 



118 Theory of Sampling [vi 

quency distribution of the variable, from which the mean, variance, 
etc., of the sample may be calculated. These may be regarded as 
estimates of the values of the parameters of the population. Any 
such estimate obtained from the sample is called a statistic,^ore 
generally any function of the sample values, used as an estimate of 
a parameter of the population, is called a statist!^ 
y When the distribution of the variable x in the population is 
known, it is theoretically possible to determine the probability 
that the estimate z of any parameter, obtained from a simple sample 
of n members, will lie in the interval dz. Let this probability be 
denoted by ^( 2 ) dz. Then the density ^( 2 ) determines a probability 
distribution called the sampling distribution of that statistic for 
simple samples of size n. Thus the sampling distribution is a con¬ 
tinuous distribution, determined by the nature of the population 
and the size of the sample; and the s.d. of the sampling distribution 
is called the standard error (s.e.) of that statistic for samples of n 
members. The sampling distribution is often defined as the dis¬ 
tribution of the values of the statistic obtained from an infinite 
(or very large) number of simple samples of the given sizel. This 
alternative way of looking at it may be a help to the student. The 
two definitions bear the same relation to each other as the d priori 
definition of probability and the empirical definition. The reader 
should bear in mind that a sampling distribution is essentially a 
probability distribution. 

Distributions of statistics, for random samples from a normal 
population, will play a prominent part in later chapters. It was 
shown in § 22 that the distribution of the means of such samples is 
normal; and we know its s.d. In general, for normal populations, or 
populations whose frequency curves are unimodal and only moder¬ 
ately skew, the sampling distributions of many of the common 
statistics approximate to the normal type as n increases indefinitely; 
so that, for large samples, they possess the property that the prob¬ 
ability of a sample value of the statistic deviating from its mean by 
more than three times its s.e. is very small. This enables us to apply 
the test of §46 to statistics obtained from large samples; and the 
determination of the s.e. in such cases is a matter of importance. 
Tests appropriate to small samples will be considered in Chapter x. 



119 


60] Distribution of the Mean 

50. Sampling distribution of the mean 
The distribution of the means of random samples of given size, 
from a specified population, is a question of considerable importance. 
Let the sampled population have mean and variance We shall 
first prove by elementary methods that the sampling distribution 
of the mean x, of random samples of size n, has fi for its mean and 
or^/n for its variance. We shall then prove more generally how all 
the moments of the distribution of the mean may be deduced simply 
from those of the population, by means of the properties of the 
cumulative function. 

Reasoning in terms of a population of discrete values, let the rela¬ 
tive frequency of the value in the population be (i = 1,2,..., A;). 
Then 

(9) 

and 0*2 = (10) 

In tKe case of a sample of one member, the distribution of the mean 
is clearly the distribution of the variable in the population. For the 
single value x^ in the sample, associated with probability is the 
mean of the sample. The expected value of the mean of the sample 
is therefore 

E(x,) = ^PiXf = n. 

Consequently the variance of the distribution of the mean of a 
sample of one is 

as stated. For a sample of n-values Xj {j = 1,2, the mean x 

is given by 

= 2 

Taking the expected value of each member we have 

E{nx) = E('^Xj) = ^E{X)) = 2/^ = 

so that E(x) — p, (11) 

Thus ihe mean of the population is the mean of the sampling distribu¬ 
tion of X, Further, since the values in the sample are independent, 
the sampling variance of their sum, nx, is the sum of the variances 
of the separate values. But each of these values is^a sample of one 



120 Theory of Sampling [vi 

member, whose distribution has variance <rK CJonsequently the 
variance of nx is rwr*, and its s.d. is The s.D. of ^ is therefore 
<r/^n. This is the required s.B. of the mean. 

If the distribution of values in the population is continuous, with 
relative frequency density /(x), the relative frequency for the in¬ 
terval dx is f(x)dx\ and this is the probability of a random value 
coming from that interval. To adapt the above argument we have 
only to replace 7 )^ by f{x) dx, x^ by x, and summation with respect 
to % by integration extending over the values in the population. 
Thus (9) and ( 10 ) become 

jxf{x)dx 

and 0*2 = J(a;— dx. 

Summation with respect to j remains unaltered, since it extends 
only over the n values of the sample. 

Example. A sample of 900 members is found to have a mean of 3*4 cm. 
Could it be reasonably regarded as a simple sample from a large population, 
whose mean is 3*25 cm. and s.d. 2*61 cm.? 

The s.B. of the mean of a simple sample of 900 from such a population is 
2-61/30 = 0-087 cm. The deviation of the moan of the sample from that of 
the population is 0-16 cm. This deviation is less than twice the s.B, of the 
mean, and is therefore not significant. We conclude that the given sample 
might be one drawn from the population specilled. 

The various moments of the sampling distribution of x may be 
found very simply* by using the additive property of cumulants, 
proved in § 16. For, since nx = follows from this property 

that the cumulative function of nx, being the sum of those of the 
variates x^, is given by 

K{t\ rix) = nK{t; x) 

= nKi t -h 71^2 ^72!+ <73! + • • *» 

the cumulants being those of the population, and the second 
argument in brackets denoting the variate whose cumulative 
function is indicated. But the rth moment of x is obtained from that 


* Cf. Fisher, 1029. 1, p. 202. 



51 ] Distribution of the Mean 121 

of nx by dividing by n**. Hence the cumulative function of 5 is found 
by substituting tjn for t in the above expansion. The cumulative 
function of the distribution of the mean is therefore 


K(t\ x) 


/Cit + 


n 2! 


+ 


/fg Km 

n23l 


The rth cumulant for the distribution of the mean of the sample 
is thus found from that of the population by dividing by In 
particular, the mean of the distribution of x is and is therefore 
the same as the mean of the population. The variance of x is o’^IUy 
and the third moment about the mean of the distribution is 

That the sami)ling distribution of the mean is approximately 
normal for large samples, when the population has only moderate 
skewness and excess, also follows from the results of §16. For, 
since the rth cumulant of the distribution of the mean is obtained 
from that of the population by dividing by the skewness of the 
distribution of the mean is 


/ fi \ 1 

-?(-.) =* (skewness of the population). 

^ \^2/ 

Similarly, the excess of kurtosis of the distribution of the mean 
K 71® 1 

~ - (excess of population). 

71” /Cg 71 

Thus, for large samples from a population of moderate skewness 
and excess, the skewness and excess of the distribution of x are 
small, and the distribution is approximately normal. 


51. Normal population. Fiducial limits for unknown mean 
Suppose that the population, from which the random sample of 
71 values is drawn, is a normal population with mean ii and s.d. er. 
Then the sample values are independent normal variates, with the 
same mean and s.d.; and, by the theorem of §22, their mean x is 
normally distributed with mean /i and s.d. ct/^ti. If we know or® 
but not /i, there is a range of possible values of ji for which the 
observed mean x of the sample is not significant at any specified 
level of probability. In the sampling distribution of the mean, the 



122 Theory of Sampling [vi 

relative deviation of x from its expected value is the ratio otx—fi 
fco the s.B. of x; and this ratio is {x-/i)yjnlcr. If then the observed 
value X is not significant at the 6% level of probability, this 
relative deviation must be less than that which, in a normal dis¬ 
tribution, is exceeded numerically with a probability of 5 %. Such 
a relative deviation is 1-96. Consequently the observed mean x 
will be not significant provided 


which requires 


(T 


< 1-96, 


X— l*96(r/-^n < ii< x+hQ(^(rl^n> 


The values x + VdQcrl^n are called the 96 % fiducial limits, or con¬ 
fidence limits, for the mean of the population corresponding to the 
given sample. They are the limits within which ja must lie, in order 
that the observed sample mean should not be significant at the 
prescribed level of probability. 

Similarly, we define the fiducial limits for other levels of prob¬ 
ability. Thus, in a normal distribution, the relative deviations 
which are exceeded numerically with probabilities of 2 and 1 % are 
2'33 and 2*68 respectively. Hence the 98 % fiducial limits for the 
mean of the normal population, corresponding to the given sample, 
are x + 2«33cr/-^n; and the 99 % fiducial limits are x + 2*68o’/-^n. 


Example, Show that, in the example of the preceding section, if the 
population is normal but its mean unknown, the 95 % fiducial limits for the 
mean are 3-23 and 3*57 cm., and the 98 % fiducial limits 3*20 and 3*60 cm. 


52. Comparison of the means of two large samples 
Given two independent simple samples, of and rtg members 
respectively, we may wish to examine whether the difference of 
their means may be accounted for by fluctuations of sampling, the 
two samples being regarded as drawn from the same population of 
S.D. <7. The s.E.’s of the means of samples of and members 
from this population are olyjn^ and ol^jn^ respectively. Hence, on 
the assumption that the samples are independent and drawn from 
this population, the s.B. e of the difference of their means is given by 





( 12 ) 



52] Comparison of Means 123 

The sampling distribution of this difference has zero for its mean, 
and is approximately normal if and are large. Consequently, 
if the observed difference of the means exceeds 36, it can hardly be 
ascribed to fluctuations of sampling; and our assumption that the 
samples were drawn from the same population is almost certainly 
incorrect. If the difference is greater than 2e, it is regarded as 
significant at the 5 % level of probability. When the variance of the 
population is not known, it may be estimated from the combined 
sample of -f members, unless the variances of the two samples 
are inconsistent with the assumption that they were drawn from 
the same population (see § 60 below). 

If the two samples are known to have come from different popu¬ 
lations, with variances (t\ and cr| respectively, we can test by a 
similar procedure whether the two populations may have the same 
mean. The standard errors of the means of samples of and rig 
members from the two populations are and (T 2 |^Jn 2 respec¬ 

tively; and the s.e. e of their difference is then given by 




(13) 


On the assumption that the two populations have the same mean, 
the distribution of the difference of the means of the samples has 
zero for its expected value, and is approximately normal for large 
samples. We may therefore test the significance of the difference of 
the sample means, by comparing it with e in the usual manner. As 
before, if the variances of the populations are not known, they may 
be estimated from the large samples. 


Example. A simple sample of heights of 6,400 Englishmen has a mean of 
67-86 in. and a s.d. of 2-66 in., while a simple sample of heights of 1,600 Austra¬ 
lians has a mean of 68*55 and a s.d. of 2-62 in. Do the data indicate that 
Australians are on the average taller than Englishmen? 

The S.E. of the mean of a sample of heights of 6,400 Englishmen is 
2-66/80 = 0*032 in., and that of the mean of a sample of heights of 1,600 
Australians is 2*52/40 = 0-063 in. The s.e. of the difference of the means is 

e = V[(0-032)* +(0-063)*] = V(0*004993) = 0-07 in. 

The observed difference between the means of the samples is 0*70 in., which is 
10 times its s.e. Hence the data are inconsistent with the assumption that 
the means of the two populations are equal; and we conclude that Australians 
are on the average tcJler them Englishmen. ^ 



124 Theory of Sampling [vi 

53. Standard error of a partition value 

Consider the s.u. of a partition value for a large random sample 
of n members, drawn from a continuous population in which the 
relative frequency density of the variable is f(x). Let the partition 
value be that for which p is the fraction of the frequency lying 
above it, and q the fraction below it. Also let Xp be the partition 
value for the population, and Xp + Sx^ that for a large sample of n 
items. The relative frequency of values above Xp in the population 
is p. In the sample the relative frequency above Xp + SXp is p, and 
above Xp it is p -f Sp. Thus Sp is the relative frequency in the sample 
for the interval Sxp. Now for a large sample Sxp and Sp are small; 
and Sp is, to within infinitesimals of higher order, the relative 
frequency for the interval Sxp in the population. Consequently 

=fi^p)^Xp = y^^py 

where y is the ordinate at Xp for the frequency curve y = f{x) of 
the population. On squaring we have 

Now y is independent of the sample, and SXp and Sp vary about 
zero as mean. Hence, on taking expected values of each side, 
we obtain 

y 

Now E{SXpY is the variance of the sampling distribution of Xp, and 
its square root is the s.B. of this partition value. Similarly, E(Sp)^ 
is the variance of the proportion of successes in n trials, with con¬ 
stant probability p that the value of the variable selected will be 
greater than Xp\ and we know that this variance is pqin. Con¬ 
sequently, on taking the square.root of both sides of the above 
equation, we obtain the required s.B. e of the partition value* as 



If the value of y for the population is not given, it may be estimated 
from the frequency distribution of the large sample. 


See also Kendall, 1940. 4. 



58] Partition Values 125 

An important case is that in which the distribution of the variable 
in the population is normal^ with s.d. cr. The value of p is the area 
under the normal curve to the right of the partition value; and the 
table of areas in § 21 enables us to read ojBF the value of xjtr. The 
table of ordinates for the normal curve in § 20 then gives the corre¬ 
sponding value of cry\ and substitution of the values of y, p, q and n 
in (14) gives the required s.e. For example, in the case of the median 

p = h Q = h — 0'3989. 

Hence the s.e. of the median, for a large sample of n members from 
a normal population, is 

_ 5 __ /Lli = 1 . 25 -?- 

0-3989 V n 

This is 25 % greater than the s.e. of the mean. For the upper 
quartile jp = i, the area from the mean to the quartile is 0-26, giving 

xjcr = 0-6745, ay = 0-3178. 

The S.E. of a quartile is therefore 

0-3178 a/ n > 

Example, Prove in the same manner that, for largo samples of n members 
from a normal population of s.d. o’, 

S.E. of 1st and 9th deciles = l*71o'/-y^, 

S.E. of 2nd and 8th deciles = l-43o'/^/i, 

S.E. of 3rd and 7th deciles = 1*32o’/7h, 

8.E. of 4th and Cth deciles — l’27o’/v^. 


COLLATERAL READING 

Yule and Kendall, 1937, 1, chapters xvm-^. 
Rietz, 1927, 2, pp. 114-18 and 140-52. 

Rietz (ed.), 1924, 1, chapters v and vi. 

Aitken, 1939, 1, chapter ill. 

Kenney, 1939, 3, part ii, chapter vi. 

Rider, 1939, 6, §§34-37. 

Mills, 1938, 1, pp. 462-73. 

Gamp, 1934, 1, part n, chapter iv (to p. 254). 
Ezekiel, 1930, 2, chapter ii. 

Tippett, 1931, 2, chapter in. 

Snedbcor, 1938, 3, chapters in, rv and vm. 
Jones, 1924, 3, chapter xn. ^ 



126 Theory of Sampling [vi 

EXAMPLES VI 

^ 1. A biased coin was thrown 400 times, and heads resulted 240 
times. Find the s.b. of the proportion of heads in 400 throws; and 
deduce that the probability of throwing heads in a single trial 
almost certainly lies between 0*63 and 0*67. 

./ 2. A random sample of 600 pineapples was taken from a large 
consignment, and 66 were found to be bad. Show that the s.b. of 
the proportion of bad ones in a sample of this size is 0-016; and deduce 
that the percentage of bad pineapples in the consignment almost 
certainly lies between 8-6 and 17*6. 

• 3. In a random sample of 800 adults from the population of a 
certain large city, 600 are found to have dark hair. In a random 
sample of 1,000 adults from the inhabitants of another large city, 
700 are dark-haired. Show that the difference of the proportions of 
dark-haired people is nearly 2-4 times the s.b. of this difference for 
samples of the above sizes. 

’ 4. In two large populations there are 30 and 26 % respectively 
of fair-haired people. Is this difference likely to be hidden in sam¬ 
ples of 1,200 and 900 respectively from the two populations? (The 
difference, 0-05, in the proportions is more than 2\ times the s.b. 
of this difference for.such samples. Hence it is unlikely that the real 
difference will be hidden.) 

5. Given that, on the average, 4 % of insured men of age 65 die 
within a year, and that 60 of a particular group of 1,000 such men 
died within a year, show that this group cannot be regarded as a 
representative sample, seeing that the actual deviation of the 
proportion of deaths is more thaiji three times the s.B. of the pro¬ 
portion for samples of this size. 

6 . In sampling of attributes a sample is drawn containing an even 

number n of members. In drawing each of the first jn members the 
probability of success is p, and for each of the remaining ones it is 
1-^, the drawings being independent. Show that the expected 
number of successes is and the variance of the number of suc¬ 
cesses is npq, , • 



Examples 127 

?• The sampling distribution of a certain statistic is normal, 
with mean 2*0 and s.e. 1*6. Show that the probability that a simple 
sample will yield a value of the statistic greater than 6*0 is 0*02276; 
and the probability that the value found will lie outside the range 
-1*0 to 6*0 is 0*0455. 

Also find c if the probability of getting a value of the statistic 
greatej^han c is 0*07. (Ana. c = 4*213.) 

, \JS', A normal population has a mean of 0*1 and a s.d. of 2*1. Find 
the probability that the mean of a simple sample of 900 members 
will be negative. (Ana, 0*077 nearly.) 

[ A sample of 900 members is found to have a mean of 3*47 cm. 
^an it be reasonably regarded as a simple sample from a large 
population with mean 3*23 cm. and s.d. 2*31 cm.? (Ana, No. The 
deviation /a is more than three times the s.B. of the mean.) 

• 

10. The means of simple samples of 1,000 and 2,000 are 67*6 
and 68*0 in, respectively. Can the samples be regarded as drawn 
from the same population of s.d. 2*5 in.? (Ana, No. The difference 
of the means is more than 6 times the s.e. of the diJfierence, which 
is 0*097 in. nearly.) 

*11. Show that the s.B. of the 8th decile of a simple sample of 900 
members drawn from a normal population of s.d. 2*5 cm. is 0*12 cm. 
nearly. 

12. Prove that the coefficient of correlation between errors in 
the partitipn values corresponding to proportions p and p' of the 
frequency lying above them is, for large samples from a continuous 
population, yj(p'qlpq') in which p>p\ In particular, for the lower 
and upper quartiles this coeflicient is 1/3. (Cf. Yule and Kendall, 
1937, 1, p. 385.) 

13. Using the value e = VZ63(rlyJn for the s.e. of the lower (or 
upper) quartile in a large sample of n values from a normal popula¬ 
tion of S.D. O', find the s.B. of the semi-interquartile range. 

Since the interquartile range is the difference between the upper 
and lower quartiles, and the correlation between errors in these 



128 Theory of Sampling [vi 

statistics in a large sample is 1/3 (cf. Ex. 12), the s.B. of the inter¬ 
quartile range is, in virtue of § 32 (46), 

+ = 26 /^ 3 . 

The s.£. of the semi-interquartile range is therefore 
e/V3 = l*363cr/V(3n) = 0-787a'/>. 

14. The quantities (t = 1,2, ...,n) are independent values 
chosen at random, one from each of n populations with the same 
mean, and with variances (rf. Show that the linear estimate of the 
common mean, which has the least sampling variance, is that 
obtained by weighting the inversely as their variances. Also 
prove that this minimum variance is Hjny where H is the harmonic 
mean of the variances cr\, (Cf. Ex. iv, 6.) 

15. From a finite population of N values, with variance o*^, a 
random sample of n values is drawn without replacements. Show 
that the sampling variance of the mean of the sample is 

{N’-‘n)(T^ln{N—1), 

16. Variable Poissonian sampling. The three types of sampling 

considered in § 47 are all included in the following. Suppose that, in 
the drawing of a Poissonian sample, there are various types of 
sampling, each type having its own probability. Let be the 
probability of the A;th type of Poissonian sample, the size of a 
sample of this type, {j = 1, the probabilities of success 

at the individual drawings in this type, so that the expected number 
of sucesses in such a sample is virtue of §47 (4) the 

variance of the number of successes per sample in this type is 

where (r J is the variance of the Pf^j for the Hh ty 3 )e of sampling. The 
expected number of successes per sample when all types are 
possible is 

k • 



129 


Eaxmples 

and the variance of the number of successes per sample for all types, 
being the mean square deviation of the number of successes per 
sample from x, is given by 

C® = SW*[€|+(%-*)*] 
k 

where 0 "% is the variance of the Xj^ for all types. Hence the general 
formula* 

e* = 2 + 0 ^-2I ^+ . 

In the particular case, for example, in which all the have a 
common value, p, we have 

= ^k=pnk< x = pn, crl:=p<^cri, 

and the above formula becomes 

= pn+p^(Tl-p^J^mknfg = npq+p^al, 
k 

which is §47 (8). 

17. In a certain population the proportion of members possessing 
a given characteristic is p. Prove that, if p may be varied, the 
probability of obtaining m such members in a simple sample of n 
from the population is greatest when p = injn, 

18. Prove that, in simple sampling from a population, the 
expected value of the rth moment of the sample about any fixed 
value is equal to the rth moment of the population about that 
value. 

♦ Cf. Aitken, 1939, 1, pp. 53-4 and Coolidge, 1925, 2, pp. 66-72. 


WMS 


9 



CHAPTER VU 


STANDARD ERRORS OF STATISTICS 

54. Notation. Variances of population and sample 
Having already considered the s.e.’s of the mean and the partition 
values, we now pass to those of various other statistics. We shall 
argue in terms of a population of discrete values x^{i — 1,2,..., k)^ 
the relative frequency of the value in the population being 
The argument covers approximately the case in which the values 
of the population are grouped in k classes of small class interval, 
the tth class being centred at the value and having a relative 
frequency The sum of the relative frequencies of all the other 
classes is therefore g*, where as usual = 1. Then, as in §60, 

the mean /t of the population is given by 

k 

/» = S (1) 

fvi) 

and its variance (r^ by ^ " 

O'* = (2) 

Consider now a simple sample ofjL-members drawn from this 
population. The sample values fall into the same classes as above; 
that is to say, the values x^ are the same for the sample as for the 
population, but the relative frequencies of the classes vary with the 
sample, fluctuating about the corresponding values for the popula¬ 
tion. The probability that, in simple sampling, a value drawn will 
belong to the ith class is p^. Hence the frequency of that class, in 
a sample of n members, has mean value np^y since this is the expected 
value of the number of successes in n independent trials with con¬ 
stant probability p^ of success. Thus 

E(Ji) = (3) 

The mean x of the sample is given by 

«* = (4) 

and its variance 8^ by 

.nS* = - ®)* «**• (B) 



64,55] Estimate of Variance 181 

We may notice at the outset that the expected value of 8^ in 
sampling is not but (n — l)(r^/n. To prove this we observe that 
8^, being the variance of the sample, is the second moment of the 
sample aboutdiminished by Thus 

Consider the expectation of both members of this equation. In the 
first term on the right remains constant during sampling; 

and therefore, in virtue of (3), the expected value of this term is 

1 2 [(*<-/*)^^(/,)] = = 0 -*. 

Further, E{x—fi)^ is the sampling variance of the mean, and is 
therefore equal to Substituting these values we obtain 

E(S^) = 0-2 - - = — (T* (6) 

• ^ ’ n n ' ' 

as stated above. If then we write 

— 2/i(^i - S)*. (7) 

our result is E{s^) = cr^. (8) 

We say that is an unbiased estimate of ar^, because its mean value 
in sampling is cr^. It is in this sense a better estimate of the popula¬ 
tion variance than 8^. Incidentally also it follows from the above 
argument that the second moment of the sample, about the mean 
of the population, has o’* for its mean value. 

55. Standard errors of class frequencies 

We have seen that, in simple sampling from the above population, 
the probability that a value will fall into the ith class of the sample 
is Pi, and that the frequency of this class has mean value 
The deviation of this frequency from its mean value will be denoted 
by Sfi , The s.e. of is the square root of the expected value of 
{ Sfi )^; and, sinceis the number of successes in n trials of constant 
probabihty its variance is npiq^. Thus 

. ^(%)^-np<g<«np<(l-p<). (9) 


9-3 



182 Standard Errors [vii 

If is unknown, an estimate of its value derived from the sample 
isfiln; and, as an approximation to the above sampling variance, 
we have 

( 10 ) 

But this is a fair approximation only if n is large. A better estimate 
is obtained by multiplying this expression by 7i/(w — 1). To prove 
this we consider the expected value of the second member of (10). 
Since E(fl) is the second moment of about zero in its sampling 
distribution, we have 

E(fi) == sampling variance of+ [E{f^)]^ 

* npiqi + {np^)\ 

Consequently 

^ifi-filn) = npi - [2)<(1 -Pi) +n^)?] 

^(n-l)Pi{l-pi) = ^E(Sfi)\ 

7b 

Hence the formula 

(10') 

gives a better estimate than (10), since the expected value of the 
second member of (10') is equal to the first member, so that it is an 
unbiased estimate of the sampling variance ofFor a large sample 
the factor n/(n— 1) is nearly equal to unity, and (10) gives a suffi¬ 
ciently good estimate. 

56. Covariance of the frequencies in different classes 
Since the sum of the class frequencies in any sample is constant, 
for samples of given size n, it follows that 

( 11 ) 

i 

The covariance between the frequenciesand of the tth and jth 
classes is the covariance, E{dfidff)^ of their deviations Sfi and Sf^ 
from their mean values np^ and np^. The calculation of this co- 
variance may be associated with a correlation table in the values 
of dfi and Sf^. Consider the array in which df^ has a fixed value. 
Bearing (11) in mind we assume that, in all the samples for which 



56,57] Class Frequencies 138 

8fi has this fixed value, the excess 8fi produces an equal deficiency 
which is distributed among the other class frequencies, on the 
average, in proportion to their expected values; that is to say, we 
assume that, in the array with constant the average value 
of is 

In calculating the covariance we may, as proved in §33, replace 
each value of in the array by the mean value for that array. 
Then 

E{Sm) = 

= -nPiPi (13) 

in virtue of (9). This is the required covariance. 

If the values of and pj are unknown, an approximation is 
obtained by taking them as fjn and fjjn. The second member of 
(13) is then replaced by -fjjln, A better estimate, however, is 
given by 

( 14 ) 

This may be shown by determining the expected value of the second 
member. Thus, in virtue of § 23 (6), 

and therefore, by (13) and (3), 

Wifi) = n^PiPi-i^PiPi = ^)PiPi 

= -{n-l)E(dMi). 

Consequently, (14) gives an unbiased estimate of the covariance, 
since the expected value of the second member is equal to the co- 
variance. In the case of a largo sample, the denominator n — 1 may 
be treated as n, 

57. Standard errors in moments about a fixed value 
It is customary to employ Greek symbols for moments of the 
population, and the corresponding Italics for those of the sample. 
Thus fif. and denote respectively the rth moment ofithe population 



184 Standard Errors [vii 

about the mean, and about some other specified value, while 
and m' denote the corresponding moments of the sample. For the 
rth moment of the sample about a; = 0 we have 

Hence, owing to deviations of the class frequencies from their 
mean values we have a deviation im' of the rth moment from 
its mean value, given by 

On squaring both sides we obtain 

ij 

where S' denotes summation over all integral values of t and j from 
U 

1 to fc, except those for which i = j. Taking expected values of both 
members of this equation we deduce, since the values are the 
same for each sample, 

u 

Consequently, on dividing by n and rearranging, 
nE(dm;)^ = 

- /t^r- = /tir- W®. 

the moments P 2 r K being those of the population. Hence the 
required formula 

= (15) 

If, however, on taking expected values as above we use the estimates 
(10') and (14), we obtain the unbiased estimate of the sampling 
variance of m', 

-(16) 

tl— 1 


in terms of the moments of the sample. 



58] Moments 185 

Since the mean of a distribution is the first moment about» =» 0, 
we may deduce the variance of the mean of a sample of n members 
by putting r = 1 in the above. Thus (15) becomes 

= (17) 

and (16) gives, in terms of sample moments, 

(18) 

n — 1 n— 1 n 

in agreement with (8). 

58. Covariance of moments of different orders about a fixed value 
The calculation of the covariance of the gth and rth moments of 
the sample about the fixed value a; = 0 is similar to the above. With 
the same notation we have 

nm'q = nm; = 2/^xJ, 

and therefore nSm'^ = 2 n8m[, = 2 

On multiplying corresponding sides of these equations we obtain 

i iJ 

Now take the expected value of each member of the equation. 
Then, in virtue of (9) and (13), 

n^E{dm'^8mD = 

and consequently, on dividing by n and rearranging, 
nE(Sm'gdml) = 

Hence the required formula 

E(8m'^dmD = ( 1 ^) 

in terms of the moments of the population. If, however, on taking 
expected values as above, we use the approximations (10') and (14), 
we obtain the unbiased estimate 

jB(«m'« ot;) (m;+,-m'm;) 

in terms of moments of the sample. 


( 20 ) 



Standard Errors 


18 ^ 


[Vll 


59. Standard errors of the variance and the standard deviation 
of a large sample 

The mean of a sample changes with the sample. Hence we cannot 
use (15) to obtain the sampling variance of the second moment 
about the mean of the sample. In the case of large samples we may 
obtain the required result as follows. The variance of the sample 
is connected with the second moment about a; = 0 by the usual 
relation 

m2 = mg — 

Consequently - 2 x 


and therefore, on squaring, 


(i) 

where Sx = -'Zx^Sf^, Sm'^ = -'Zx\dfi. (ii) 

i i , 

Suppose now that the origin of x is taken to coincide with the mean 
of the population. Then x becomes identical with 8 x, for it is now the 
deviation of the mean of the sample from the mean of the popula¬ 
tion. If we take expected values of both members of (i) we may 
show that, when n is large, the contributions of the second and third 
terms on the right are small compared with that of the first term. 
By (15), since the origin is now the mean of the population, the 
expected value of is Also 5^(5^)* is (^^)^, and its 

expected value is of the order 1/n®, in virtue of (17). Similarly, 
xSx 8 m 2 is (Sx)^Sm 2 , and its expected value is of the order in 
virtue of (15) and (17). If then n is large, the expectation of the 
first term on the right of (i) is large compared with those of the 
second and third terms, and we have the approximate formula* 


71 


( 21 ) 


* Fisher has shown that, for samples of any size, the exact sampling 
variance of (the unbiased estimate of population variance) is 

(/i 4 — 3/ii)/n + 2/i2/(n—l) (Fisher, 1929, 1, p, 206). 

For a normal population this expression has the value 2or*/{n— 1). 



59,60] Varicmce and s,d, 137 

The s.B, of the variance of the sample is the square root of this 
quantity. In the particular case of a normal population, with s.d. cr, 
we have = 3cr^ and so that, for large samples from a 

normal population, 


E(8m^^ = 2a^ln. 


( 22 ) 


From (21) may be deduced the s.B. of the s.d. of large samples. 
For the variance of the sample we have, with the above notation, 
and therefore 

* = 28S8. 

For large samples the factor S may be taken as equal to the popula¬ 
tion parameter cr, 8 S being small. Hence on squaring, and taking 
expected values of both members, we have 


and therefore, in virtue of (21), 

(23) 


In the case of large samples from a normal population this is simply 

E(S8f = (24) 

and the s.B. of the s.d. is, in this case, ( r /^{ 2 n ). 


60. Comparison of the standard deviations of two large samples 

For two independent large samples of Ui and rig members from 
the same population, the variance e* of the difference of their 
standard deviations, being the sum of the variances of their standard 
deviations, is in virtue of (23), 


6 * 


4/ta nj' 


(25) 


In particular, if the population is normal, this equation takes the 
simple form 

(26) 


\»1 « 2 / 


The expected value of the difference of the s.n.’s of two such 
ff^.iTiplftH is zero. Hence, by comparing the actual difference S,- <S, 



188 Standard Errors [vii 

with the s.B. e, we may test in the usual manner the credibility of 
the hypothesis that the samples were drawn from the same popula¬ 
tion. 

For two large simple samples from different populations, whose 
moments are and respectively, the s.B. of the difference 
of their standard deviations is given by 


4/t2ni 4/4271, ■ 


(27) 


And in particular, if the populations are normal with standard 
deviations cr^ and respectively. 


e* 


2712 * 


(28) 


Example, The s.d. of a simple sample of 2,000 members is 5*9 years, and 
that of €ui independent sample of 2,600 members is 6*1 years. May the 
samples be reasonably regarded as from the same normal population T 

On the hypothesis that they are from the same normal populaticAi, an 
approximate value for its s.d, is 6 years, and the s.E. of the cUfference of the 
standard deviations of samples of the above sizes is, by (26), 

€ = 6 = 6(0*0212) = 0*127 year. 

The actual difference of 0*2 year is less than 1*66, and is therefore not signi¬ 
ficant at the 6 % level of probability. There is thus no real evidence against 
the hypothesis. 


Sampling from a Bivariate Population 

61. Sampling covariance of the means of the variables 

Suppose now that the population is bivariate, with as relative 
frequency of the pair of values t/J, or of the class with centre 
at that point. The means of x and y in the population are then 

i i 

and the moments of orders q and r in x and y respectively are 

K.r = 

the first about the point (0,0), and the second about the mean of 
the population. In particular is the covariance of x and y in 
the population., 



61] Covariance of the Means 189 

In a simple sample of n pairs of values, let f^ be the frequency of 
the tth class. Then the values of the moments of the sample are 
given by 

the covariance of x and y in the sample being The argument 
of §§ 65 and 66 still holds, and the formulae 

E{8fiY = «Pi(l -Pi), E{SfiSff) = -npiPf 

are still valid. 

Corresponding to (17) we may now prove that the covariance of 
the means x, y, in samples from a bivariate population, is For 

ny = S/jj/<. 

Hence the deviations Sz, Sy from /t, ft' respectively, and the devia¬ 
tions Sfi from their mean values np^, are connected by 

, n8x = -Z,XiSfi, ndy='Zyidft, 

and therefore on multiplication 
n*SxSy = 

i i.i 

Taking the expected value of each member we deduce 

n^E{8xSy) = Z^iyitipii} -Pi)-!,'Xty^np^pj, 
i i,i 

and therefore 

nE{8x8y) = {I^Pjyj) 

i i i 

= =*/*!. 1 - 

Hence the required result 

E(8x8y) ^ fi^fn. (29) 

From this it follows immediately that the coefficient of correlation 
between x and y is equal to the correlation p between x and y in the 
population. For the correlation between x and y is 

covariance of x and y _ Jn _ 

(s.B. of z) (s.B. of y) ” {(Til^n) ((rjyjn) “ (Ticra “ 

0*1 and cr| being the variances of x and y in the population. 



140 Standard Errors [vii 

We may also prove that the expected value of the covariance of 
X and y in the sample is given by 

(30) 

TV 

which corresponds to (6). For, by the known relation, § 23 (6), 

TV 

Taking the expected value of each member we have, in virtue of (29), 

TV fV 

TV 

as required. 

62. Variance and covariance of moments about a fixed point 
The moment of the sample about the fixed point (0,0), of order 
qinx and r in y, is given by 

wS.r = 'L4i/iU 

and therefore, with the usual notation. 

Squaring both sides, and taking expected values as in § 67, we deduce 

i iJ 

and therefore 

^(^S. r)* = \ (/* 29 , 2 r - K!r) (31) 

in terms of the moments of the population. This is the sampling 
variance of m'If, in taking expected values, we use the approxi¬ 
mations (10') and (14), we obtain the unbiased estimate 

E{dm'^_ r)* " (Wl^, 2r - 

in tenuB of the moments of the sample. 



( 32 ) 



141 


62-64] of Covariance 

Similarly, we may find the sampling covariance of the moments 
m'q j. and m', about (0,0), by multiplying together the expressions 
for ^ and , and taking expected values of both sides. Pro¬ 
ceeding as above we obtain the result 

^ {/J■'g+s,r^■t-K.rM■s.l)‘ (33) 

74 

In particular by putting r = s = 0 and q = t = 1, and observing 
that = /ly /Iq i — /i\ we find again the covariance of the means 

E{Sxdy) = 

74 74 

As in other cases we may deduce the unbiased estimate corre¬ 
sponding to (33), with sample moments and denominator n — 1. 

63. Standard error of the covariance of a large sample 

By argument similar to that of §59 we may find the s.e. of the 
covariance of a large sample from a bivariate popidation. For, with 
the usual notation, since 

^ 1.1 = 

we have i ^ — ydx — xSiy. 

If now we take the origin at the mean of the population, x becomes 
identical with dx and y with 8y. Consequently, if we retain only the 
principal terms we have, on squaring the last equation and taking 
expected values, 

and, since the origin is at the mean of the population, this is, in 
virtue of (31), 

/4 

giving the square of the s.e. of the covariance of the sample. 

64. Standard error of the coefficient of correlation 

The S.E. of the coefficient of correlation, r, for large samples of 
n pairs from a bivariate population, may be found as follows.* 


Cf. Bowley, 1920, 2, pp. 422-3. 



142 Standard Errors [vn 

Omitting the comma between the two subscripts in the moment 
symbols we have, by the usual formula, 

r = m„/VKor»oj), 

and therefore, by logarithmic differentiation, 

St 1 SYft2Q 1- 

f “ 2 mgo 2 ’ 

If the origin is taken at the mean of the population, so that x and 
y are identical with Sx and Sy^ and we retain only small quantities 
of the first order, we find as in the preceding section 

= Sm’u. = 

n i 

Similarly, from the relations 

^20 = ^20 - ^02 = <2 - y\ 

we find, on retaining only the principal terms, 
im2Q = ^w^2o = “ S 
SmQ2 = ^^02 “ “ 2 J/i 

Now substitute these values in (i). Then, since n is large and we are 
retaining only small quantities of the first order, we may replace 
the denominators in (i) by the moments and correlation coefficient 
p for the population. Thus 

P 2/«2o 2/«J n 

= <?/«/», 

where F{x^^ y^) denotes the expression in brackets. Squaring both 
sides we have 

= i S IF{X„ y<) Sf,f+1 S' nxt, yt) F(x^, y^) 8f, 8^ 

\P 1 ^ i ^ i.i 

and therefore, on taking expected values, 

F(Sr)' = ^ fs F*{xt, y^)p^{l -p^) - S' F{xt, y,) ^(a:,, 

^ L i.i J 



64] CoeffiderU of Correlation 

Now the second sum on the right is equal to zero. For 


148 


J.J . 0. 

On substituting the value of we deduce the sampling 

variance of r in the form 


W Lj“i 1 ^i“20 ^/^02 /*20/^ll rQzP'll ^/^20r02j 

This result assumes a very simple form in the case of a bivariate 
normal population. The parameter values are then 

/»*, = (l + 2/3*)o-f<r|, /tu =/oo-itTj, /<2o = (r|, /«o* = o-l 

/*40 ~ /*(M ~ /*8i “ /*!# “ 

and on'substituting these values we obtain 


£(^,)2 = l ( l _ p 2 ) l . 

n 


Consequently 


s.E. of f 


iriL*. 


(36) 


These formulae hold for large values of n. Their application, 
however, is limited by the fact that the sampling distribution of r 
is not even approximately normal when r is fairly large. The s.E. 
test for r should therefore be apphed only for large samples and 
moderate values of r. A much more useful test, which is applicable 
to small or large samples, will be considered in Chapter x. 


COLLATERAL READING 

Rietz, 1927, 2, chapter v. 

Yule and Kendall, 1937, 1, chapter xxi. 
Jones, 1924, 3, chapters xiii and xiv. 
Rietz (ed.), 1924, 1, chapter v. 

Camp, 1934, 1, part n, chapter iv. 
Bowley, 1920, 2, pcut n, chapter ix. 
Kendall, 1943, 2, chapter ix. 



144 


Standard Errors 


[vii 


EXAMPLES Vn 

1. In a sample of 10,000 from a normal population the s.d. is 
2*62 cm. Show that the s.e. of tliis quantity is 0*018 cm. nearly, and 
hence that the s.d. of the population almost certainly lies between 
2*46 and 2*58 cm. 

2. A random sample of 6,400 members from a certain population 
has a S.D. of 6*80 years, and a fourth moment of 3298 yr.^ Show that 
three times the s.e. of the s.d. is 0*094 nearly, and hence that the 
S.D. of the population almost certainly lies between 6*7 and 6*9 years. 

3. The S.D. of a random sample of 1,000 members is 6*9 years, 
and that of an independent sample of 900 members is 6*1 years. 
Show that the samples may reasonably be regarded as drawn from 
equally variable normal populations, since the difference of their 
standard deviations is very httle greater than its s.e. 

4. Standard error of the coefficient of variation. By definition this 

coeflSicient is F = Logarithmic differentiation gives 

8V dm^ Sx 
V 2m2 X * 

Squaring both sides and taking expected values show that, for 
large samples 

It can be shown that, for samples from a normal population, 
E(8x8m^ = 0. Show that in this case the above result leads to 

S.E. of F= FV[(l + 2P)/2n]. 

5. Standard error of a moment about the mean of a large sample. 
Proceeding as in § 59 we have 

nwig = 2 

Sm^ * .... 


and therefore 



Examples 145 

If the origin is taken at the mean of the population, x becomes 
identical with dx\ and, as we need retain only the terms of lowest 
order, we neglect those after the first two. On squaring we may 
write the result 

and on taking expected values of both sides we obtain, in virtue 
of (16) and (19), the origin being the mean of the population, 

^ -/tj + gX-i®’* - 

The square root of this quantity is the s.b. of 

6. Deduce that, for large samples from a normal population of 

s.D. cr, the s.E.’sof m 3 and a,rQ(T^^{6ln) andcr^^(96/7i) respectively. 

7. A random sample of 3,600 pairs of values from a bivariate 
normal population showed a correlation coefficient of 0*46. Find 
limits to the correlation in the population. 

Since the sample is large we may take 0*45 as an approximate 
value for p. The s.b. of r is then (1 — 0*2025)/60 = 0*0133, so that 
3e = 0*04 nearly. The coefficient of correlation in the population 
almost certainly fies between 0*41 and 0*49. 

8. A random sample of 2,600 pairs of values from a bivariate 
normal population showed a correlation of 0*1. Is this really signifi¬ 
cant of correlation in the population? 

On the assumption that the population is uncorrelated we have 
the S.B. of r = 1/50 = 0*02. The actual value is 6 times this S.B., 
and is therefore significant. 

Show that the above correlation of 0* 1 would not be significant 
in a sample of less than 400 pairs. 

9 . By means of the result of Examples vi, 18 (p. 129) deduce* 
the formulae (15) of §57 and (19) of §58, 

* Cf. Kendall, 1943, 2, pp. 205-6. 


WMS 


lO 



CHAPTER Vm 


BETA AND GAMMA DISTRIBUTIONS 


65. Beta and Gamma Functions 

For the benefit of the student who is not familiar with them, we 
shall first prove the elementary properties of the Beta and Gamma 
functions. The integral 

r{n) = J (1) 

converges if n is positive. It is a function of n called the Gamma 
function. Clearly 

r(l) = JJe-»(te= 1 . ( 2 ) 

Also, if n-1 is positive, we have on integration by parts 

— + J (n — 

so that r(n) = (n— 1) r{n -1). (3) 

Hence, if n is a positive integer, 

r(n) = (n-1) (n- 2)... 2.1. r(l) = (r^-1)1. (4) 



On account of the property expressed by (3) and (4), r(n) is often 
denoted by (n-1)! whether n is integral or not. Also, on writing 
x^ in place of a; in (1), we have the alternative formula 


r(n) = 2 J exp (- x^) dx. 


( 6 ) 


And, by an obvious substitution, it is easily verified that, if a is 
positive, 


j: 


= a-^^ r{n). 


( 8 ) 



65,66] 

The integral 


Beta Fimction 






147 


(7) 


also converges if m and n are positive. It is a function of m and n 
called the Beta Function, That it is symmetrical in m and n is 
easily shown by the substitution 2 = 1 — a;. Then (7) becomes 

B{m,n) = J 2 "“^( 1 —z)”»“^d 2 = B{n,m). 

Further, on substituting x = sin^^ in (7) we obtain 

Sin2»»~l QQQ2n-l QflQ^ 


rhn 

B(m, n) = 2 e 
and therefore in particular 

Ba, l) = = ”, 

while-from (7) it is obvious that 

Lastly the substitution x = 1/(1+y) in (7) leads to an important 
alternative definition, viz. 

y^-^dy 


( 8 ) 

( 9 ) 

( 10 ) 




.n)-l 


0 


( 11 ) 


and in this integral m and n may be interchanged, in virtue of the 
symmetry of the function. 

Example, Show that 


B(m, 






dx. 


This may bo deduced from (11) by dividing the range of integration into two 
parts, 0 to 1 and 1 to oo, and putting y = 1/a? in the integration over the 
second part. 

66. Relation between the two functions 

That the Beta and Gamma functions are connected by the relation 




(12) 


may be proved as follows. Consider the integrals 

sa* 2 J exp (-- x^) dx, Jg = 2 y 2 n-i ^^p (- y^) dy, 


10-3 



148 B and F Distributions [viii 

whose limiting values as a tends to infinity are r(m) and Firi) 
respectively, in virtue of (5). Then 

/j/g = 4 I dx\ exp( —— 

Jo Jo 


or, on changing to polar coordinates. 


= 4 JJexp (— f2) r 2 m+ 2 »-i ^Qg 2 m-i Q gin2»-i 0drdd, 


the integration extending over the square OACB (see Fig. 4, p. 66). 
Since the integrand is positive, this integral is intermediate in value 
between the integrals of the same function extended over the 
quadrant OAB^ of radius^a, and the quadrant OPQ, of radius a^2. 
Hence lies in value between 

4 r cos^”^-^ 6 sin*-^""*^ OdO ( exp (— r^) 

Jo Jo 

and the corresponding integral with 0 and a ^2 as the limits for r. 
But, as a tends to infinity, each of these integrals tends to the limit 
J5(m,w)r(m + n), while and tend to the limits r(m) and r{n) 
respectively. Consequently 

B{m,n)r(m+n) = r{m)r{n) 

as stated in (12). 

Putting m = 71 = ^ in this result we have, in virtue of (2) and (9), 
Consequently r{\) = (13) 



By writing in place of x in this integral, or by putting n = J in 
(5), we deduce 


j: 


exp(—a;2)(ia; = l^Jn. 


( 14 ) 


EasampU 1. Show that, if m is a positive integer, 

2m—1 2m —3 


r(m + i) = 


2 


2 





67] Gamma Distribtdion 

Example 2. Show that, if m and n are positive integers, 

' (TO+n-1)! 

In particular, i?(l,n) = 1/n. 

Example 3. Show that 

B(m-f l,n)/I?(m,n) = m/(m+n) 

and fi(m + 2,n)/B(m,n) = m(m+ l)/(m + n) (m-\-n-\- 1). 

Example 4. Show that 

j5(m + 2,n —2)/B(m,n) = m(m+ l)/(7i- —2). 

Example 5. Show that 


149 




xy^-^x^-^dx — B{m,n). 


67. Gamma distribution and Gamma variates 


In^ virtue of (1) a continuous variable x, which is distributed 
with probability density 


<f>(x) ■■ 


~~1W 


(16) 


throughout the range 0 to oo, is called a Gamma variate with para¬ 
meter l\ and its distribution is a Gamma distribution,* The factor 
l/r(Z) ensures that the integral of (j>(x) over the whole range of 
values of x is unity. The reader should sketch the probability curve 
y = <p(x) for the distribution. He will find that it is asymptotic to 
the a;-axis and that, if Z > 1, it has a mode at a; = Z — 1. If Z > 2 it also 
touches the a:-axis at the origin; while, if 1 < Z < 2, it is tangent to the 
t/-axis at that point. If, however, 0 < Z < 1, the curve is asymptotic 
to both axes. 

The expected value of the variate in the distribution is given by 




7W ~iW ~ ■ 


(16) 


The second moment about a; = 0 is similarly 



= r(j+2)/r(i) = 1(1+1). 


* This belongs to KarlPearson’sTypeIII. SeeKendall,1943,2,pp. 13V-43. 



160 B and F Distributions [vin 

Hence the variance is given by 

= (17) 

Example, Show that the rth moment about a; = 0 is 
/t; = Z(Z+l)(Z+2)...(Z+r-l). 
and deduce that the third moment about the mean is 21, 


An important example of a Gamma variate is associated with the 
normal distribution. If x is normally distributed with mean a and 
S.D. (T, the probability that a random value of the variate will fall 
in the interval dx is 

dP = ^r^/|^exp[-(a:-a)V2cr*]ttr. (18) 

Let n be defined by 

u = \{x-afl(r\ 


so that, as x varies from ~ oo to 4- oo, varies from + cx) to 0 and then 
from 0 to + 00 . For values of x between a and + oo, • 

a?—a = (r^{2u) 

and dx = crdul^J(2u), 


so that 




But the probability that u falls in the interval du is double this, 
since there is an equal probability that u will fall in this interval 
when X lies between — oo and a. Consequently the probability 
differential for the variate u is 

nh) ’ 

so that tf is a Gamma variate with parameter We may therefore 
state 


Theorem I. If x is normally distributed with mean a and standard 
deviation (7, then Hx’-a)^lcr^ is a Oamma variate with parameter J. 

A Gamma variate with parameter I may be referred to briefiy as 
a y(Z) variate, the symbol being used adjectively.* 


* The objection to describing it as a r(l) variate is the additional meaning 
thus given to the^iymbol r(l). 



151 


68 ] Sum of Gamma Variates 

68. Sum of independent Gamma variates 

The moments of the Gamma distribution, and the distribution 
of the sum of independent Gamma variates, may be deduced from 
the moment generating function. The m.g.f. of a y{f) variate with 
respect to the origin is given by 


/•oo 

.¥(0-Jo 


COgteg-X^-l 


dx ‘ 


m 

= (!-<)-' ( 1 < 1 < 1 ). 

and the cumulative function is therefore 




(19) 


K{1) = - l\og{\-t) 

*=s 4'i^/2 + i^/3 + ^^/4 +...), (20) 

Thus the mean of the distribution is /, and the variance ako I, 
while the other cumulants are 

/C3 = 2U, /C4 = 3!J, #c^ = (r-1)11. (21) 

Suppose now that x and y are independent Gamma variates with 
parameters I and m respectively. Then the m.g.f. of their sum, being 
equal to the product of their m.g.f.’s, is (1 But this is the 

m.g.f. of a 7 (Z + m) variate. We thus have 

Theorem II. The sum of two independent Oamma variates, with 
parameters I and m, is a Gamma variate with parameter Z -f w. 

The converse of this theorem is almost equally important. It 
may be stated: 

Theorem III. If the sum of two independent positive variates is a 
Oamma variate with parameter Z + m, and one of them is a Gamma 
variate with parameter Z, then the other is a Gamma variate with 
parameter m. 

For, if M{t) denotes the m.g.f. of the last variate, we have, on 
equating the m.g.f. of the sum to the product of those of its com¬ 
ponents, 

whence M{t) = (1 —Z)"^, 

and the second component is therefore a y(m) variate. 



152 B and F Distributions [viii 

On account of the importance of these theorems we shall prove 
Theorem II from first principles. Let x and y be independent y(l) 
and y(m) variates, and let 2 = a; 4-y. Suppose first that y has a fixed 
value in the interval dy. Then dz = dx. The probability that a random 
value of X will fall in the interval dx is 


dp = e'-^xf'~^dx/r(l)y 

and therefore the probability that, for a given value of y, z will lie 
in the interval dz is 

dp = {z—yy-^dzir(l). 

But the chance that y will have a value in the interval dy is 


dp' = e-yy^’-^dyjr{m). 

The probability that simultaneously z will lie in the interval dz 
and y in the interval dy is the product dp dp'. By integration we 
then have the probability that, for any value of i/, z will lie,in the 
interval dz as 

To evaluate the integral put y = zt^ so that dy = zdt. Since z is the 
sum of the positive variates x and y, it is never less than y, so that 
t lies within the range 0 to 1. Consequently 


dP^ 


mnm) 



-Ijm-lcft 


_ B(l, m) ^ dz 

~ r(i)r{m) r{i+m) • 


and z is therefore a y(l -h m) variate as stated. 

Repeated application of this theorem shows that the sum of n 
independent Gamma variates with parameters (i = 1,2, ...,n) is 
a Gamma variate with parameter 2 Li particular, in virtue of 
Theorem I, we may state 

Theorem IV. If x^ (t = 1,2, ...,n) are n independent variates, 
Thcrnnally distributed about a common mean zero with standard devia¬ 
tions CTi, and = 2 ^ben is a Gamma variate with para- 

i 

metbr ^n, r 



158 


69] Beta Distribution 

69. Beta distribution of the first kind 


In virtue of the first definition, (7), of the Beta function we shall 
say that a continuous variate x, which is distributed with prob- 
ability density . i,, _ 


throughout the range of values 0 to 1, is a Beta variate of the first 
kind with parameters I and m; and its distribution is a Beta distribu¬ 
tion of the first kind* Such a variate may be referred to briefly as a 
ni) variate. The reader should sketch the probability curve for 
the distribution, distinguishing the different cases. He will find that, 
if I and m are each greater than 1, there is a modal value 


If Z > 2 the curve touches the x-axis at the origin; while, if 1 < Z < 2, 
it is tangent to the y-axis at that point. If, however, 0 < Z < 1, the 
curve, is asymptotic to the y-axis. Similar remarks hold for the 
shape of the curve near x = 1, according to the value of m. 

The mean value of x is given by 


^ Jo B(lym) B{l,m) Z + m' 


(23) 


The second moment about x = 0 is similarly 

, __ E(Z + 2,m) __Z(Z -f 1) 

~ B(l,m) ~ (Z + m)(Z + m-hl)* 


From these it follows that the variance is 


_: I_ ( 2 ±) 

(Z-fm)2(Z + m+l)* ^ ^ 

Corresponding to Theorem II there is a fundamental theorem 
which may be stated: 

Theorem V. If x and y are independent Gamma variates tvith 
parameters I and m respectively, the quotient x/(x + y) is a Beta variate 
of the first kind with parameters I and m. 

This may be proved from first principles. If we write 


z 


-, then X 


y^ 

l-z' 


• This belongs to Karl Pearson’s Typo 



154 B and F Distributions [viii 

and since x and y are both positive, the range of % is from 0 to 1. 
Suppose first that y has a fixed value in the interval dy. Then 

dx = ydzl{l—z)^. 

The probability that, for this value of y, x will lie in the interval dx 
and therefore z in the interval dz is 


dp 


e~^od~^dx I / yz / yz \ ydz 


But the chance that y will have a value in the interval dy is 


^ r(m) 


The probability that simultaneously z will lie in the interval dz and 
y in the interval dy is the product dpdp\ By integration we then find 
the probability that, for any value of y, z will lie in the interval dz as 

A)*- 

To evaluate the integral put y = (I ~-z)t. Then the limits for t are 
0 and 00 , and we obtain 

showing that 2 is a m) variate as stated. 

We may remark in passing that, if 2 is a fiiil.m) variate and 
V = 1 — 2 , then V is a 1) variate. For the range of v is also from 
0 to 1, and \dv\ = | d 2 1. Expressing the probability differential for 
2 in terms of v and dvy we obtain 


dP =s v^-\l — dvjB{lym)f 

which shows that v is a 1) variate. And the reader will observe 
that dP depends on the magnitude | dv | of the interval dv, but not 
upon the sign of dvjdz. 


70. Alternative proof of theorems 
A combined proof of Theorems II and V may be given more 
simply as follows.* As before let x and y be independent y(l) and 

^ ^ This proof is given by Sawkins, 1940, 3, p. 212. 



70] Alternative Proof 155 

y(m) variates respectively. Then the probability that a random 
value of X will fall in the interval dx and, at the same time, a random 
value of y will fall in the interval dy is 


dP. 


mrirn) 


dxdy. 


Now introduce the new variables 


u-x+y, v==xl{x + y) 
so that x = uVy y = u(l—v). 


Then as x and y range from 0 to oo, u ranges from 0 to oo and v from 
0 to 1. Also 


^y) 

d{u, v) 


Froln the above expression for the probability differential dP it 
follows that the probability density in the joint distribution of 
X and y is 

r(i) r{m) r(i) r(m) 


But the area of the element of the a; 2 /-plane bounded by the curves 
along which u has the values u and u-\-du respectively, and those 
along which v has the values v and v + di; is 


8(a;,y) 

0(U, V) 


dudv 


ududv. 


Hence the probability that, in a random selection of x and y, the 
representative point (x, y) will fall in this element of area is 


j _ du 1 - dv 


(25) 


Since this is the probability that simultaneously u will fall in the 
interval du and v in the interval dv it follows that these variates 
are independent, and that u is a y{Z-l-m) variate and v a 
variate, as stated in the above theorems. 



156 


B and F Distributions 


[viii 


71. Product of a Pi(f, m) variate and a y(^+ w) variate 

We may now prove an important property, suggested by Theorem 
V, which may be stated: 

Theorem VI. The product of a pff, m) variate and an independent 
y{l + m) variate is a y{l) variate. 

Let v be the variate and u the 7 (Z + m) variate, and let 

z = uv. The probability differential for v is 


1 dv 


Hence, for a fixed value of u in the interval du, the probability that 
z will fall in the interval dz is 


, 1 zY^-^dz 

^ ~ B{l,m)\v) \ u) u* 


Multiply this by the probability that a random value of u will fall 
in the interval du, and integrate over the range of u from 2 to oo 
(since z^u). Thus we have the probability that, for any value of 
u, z will lie in the interval dz as 

z^-^dz r® zY-'^du 

dP = --r I 1- 

B(l,m)l (l + m)Jg \ u) vf 


z^-^dz r® zY-^du 

dP = --r I 1- 

B(l,m)l (l + m)Jg \ u) vf 

To evaluate the integral put u—z = zt. Then t ranges from 0 to oo, 
and we have for the probability differential of z, 


- TWr{m)}o 




Consequently 2 is a y(l) variate as stated. 


72. Beta distribution of the second kind 
In agreement with the alternative definition of jB(m, n) contained 
in (11) we may define a Beta variate of the second kind, with positive 
parameters I and m, as a continuous variate x which is distributed with 
probability density 

throughout the range a; = 0 to a: = 00 . Such a variate may be referred 
to ‘briefly as a m) variate. Its distribution is a Beta distribution 


<I>{X) = 





157 


71 , 72 ] Beta DistribiUion 

of the second kind* It is important to remember that, while the 
values of a Beta variate of the first kind are from 0 to 1, those of a 
Beta variate of the second kind are from 0 to oo. The reader should 
sketch the difierent forms of the probability curve for the latter 
distribution. He will observe a considerable resemblance to the 
case of a Gamma distribution. Jfl> I there is a mode at 


X = {Z-l)/(m+l). (28) 

The curve is asymptotic to the x-axis; and, if Z> 2, it touches this 
axis also at the origin. If 1 < Z < 2 the curve touches the y-axis at 
the origin; and if 0 < Z < 1 the curve is asymptotic to both axes. 
When m > 1 the mean value of the variate is given by 




x^dx ^ B(l + 1, m — 1) 

(1 


Z 


m— 1 


B(Z, m) 

Also, if m > 2, the second moment about a: = 0 is 

r® B(l + 2,m-2) Z(Z+1) 


(29) 


, _ 1 p 

oil+x)‘+"‘' 

Consequently the variance is 

Z(Z-f 1) 


B(l,m) 


(m — l)(m — 2y 


= 


Z(Z-|-m — 1) 


(m—l)(m-2) (m-'l)2 (m-1)^ (?7i-2)* 


(30) 


We may observe that, if a: is a Beta variate of the second kind, 
its reciprocal is a variate of the same kind with parameters inter¬ 
changed. For, if a; is a variate, its probability density is 

given by (27). If then we put x = 1/y, we have \dx \ = \ dy\ly^\ 
and therefore, since y lies in the interval dy when x lies in the 
interval dx, the probability of this is 

.p_^ y”'~^dy 

£(i.m)(l + l/y)'+”' 'B(l,m){l+yy+^‘ 


Consequently y is a 1) variate. 


* This belongs to Karl Pearson’s Type 



158 B and F Distributions [viii 

An important relation between the two kinds of Beta variates 
is expressed by 

Theorem VII. To each Beta variate of the first kind corresponds a 
pair of Beta variates of the second kind; and conversely. 

Let V be a m) variate, so that its probability differential is 


dP = 

and let w be defined by 




B(Z, m) 
V = 1/(1+m;) 


(31) 

(32) 


or its equivalent w= (I— v)lv. 

Then \ dv\ = \dw\l{\ and by substitution we have the prob¬ 
ability differential of w as 




dw 

B(Z,m)(l+w;)'+^’ 


(33) 


Since the range of w is from 0 to oo, it follows that 14 ; is a /? 2 (wi, 1) 
variate. Its reciprocal is therefore a m) variate, and the first 
part of the theorem is proved. 

Conversely, given that ti; is a P^im, 1) variate with probability 
differential (33), we may define a variate v by means of (32). Sub¬ 
stitution then shows that the probability differential of v is given by 
(31); and since v ranges from 0 to 1, it is a p^{ljm) variate. The 
variable 1 — 1 ; is therefore a y^i(m, 1) variate, and the second part of 
the theorem is proved. 


73. Quotient of independent Gamma variates 

Corresponding to Theorem V, according to which a Beta variate 
of the first kind is determined by two independent Gamma variates, 
we have 

Theorem VIII. The quotient of two independent Oamma variates, 
with parameters I and m, is a P^ih ^) variate. 

Let X and y be independent Gamma variates with parameters Z 
and m respectively, and let v = yl(y+x). Then, in virtue of Theorem 
V, v is a Pi(m, 1) variate. But, if u? is the quotient xjy, clearly 

1 ^ 1 
~ l + x/y 1 +w* 

so that 11 ; is a P^ff, m) variate as stated. 



78] Quotient of Gamma Variates 159 


The theorem may also be proved by the method of § 70, which 
leads to the additional information that the sum a; + 1 / is distributed 
independently of the quotient xly. For if 


we have 
and therefore 


u=:x + y, v = xly 
d(u,v) ^ x + y ^ (1+t;)^ 

^(x,y) ^ u 
d{u,v) (1+v)*’ 


Since the probability that x and y fall simultaneously in the 
intervals dx and dy is 

a4-l2^m-lg-x-y fly. 

dp - TWr^ ' 


the probability density for their joint distribution is 

g-x-y g-u^/+m-2^-l 

Consequently the probability that, in a random choice of x and y, 
the representative point (x, y) will fall in the area bounded by the 
curves + du, v,v-\-dv is 

^ du dv 

/^(Z-fm) ’ jB{Z,m) (1+ 


Since this is the probability that ii and v will fall simultaneously in 
the intervals du and dv respectively, it follows that w is a y(Z + m) 
variate, and v a variate, and also that these variates are 

independent. 

Example 1. Catichy's distribution. The distribution of the quotient of two 
ind( 3 pen(ient standard normal variates follows from the above theorem. For 
if 2 is this quotient, 2 * is the quotient of two independent Gamma variates, 
each of parameter J. Consequently 2 * is a 4) variate, with probability 
differential ^ d{z*) 


The distribution of 2 follows immediately from this. For, since the range of 
2 is from — 00 to +00 while that of 2 * is from 0 to + 00 , the probability differ¬ 


ential of 2 is 


dz 

dp =-, 

^ n(l+z^y 


the factor 2 disappearing, since the integral from — 00 to + cjo must be equal 
to unity. This distribution is associated with the name of Cauchy. , 



160 B and F Distributions [vin 

Example 2. Show that, if x and y are independent normal variates with 
means mi, m, and variances crj, respectively, the quotient 


z = (a:--mi)/( 2 /-m,) 


conforms to the distribution 


with a range — oo to + oo. 


dp — 


cr^cr^ dzt 
7r{(Tl + (rlz^y 


COLLATERAL READING 

Cochran, 1934, 6, pp. 178-91. 
Pitman, 1937, 8, pp. 216-18. 
Sawkins, 1940, 3, pp. 209-17. 


EXAMPLES VIII 

1 . We shall see later (cf. § 82) that, if r is the coefficient of corre¬ 
lation between the variates in a random sample of n pairs of values 
from an uncorrelated bivariate normal population, then is a 

J(n —2)) variate. Hence show that the mean value of r* for 
such samples is l/(n- 1), and the s.e. of r therefore l/-^(n-1). Also 
show that the probability differential of r is 

_{l-r^yin-A)dr 

B(U{n-2)y 

2. Quotient of independent Oamma variates. Theorem VIII may 
be proved by direct integration, as in the cases of Theorems II, V 
and VI. Let z = xjy, where x and y are independent y(Z) and y(m) 
variates. Then for a fixed value of y in the interval dy, dx = ydzy 
and the probability that z will lie in the interval dz is 

dp = e-^^(yzy-^ydzir(l). 

Consequently the probability that, for any value of y, z will lie in 
the interval dz is 

dP = tf-^dzj”y‘+”'-h-v(i+»dymi)r(m) 

sf-^dz 

showing that 2 is a ui) variate as required. 



Examples 161 

3. Show that, for the y(l) distribution, 

(mean—mode)/(r « 1/-^Z — 

Hence some writers prefer to ^ ^ definition of skewness. 
Show that the excess of kurtosis of the distribution is 6/Z. 

4. Show that the mean value of the positive square root of a 
y{l) variate is r(l +1)1 r(l). Hence prove that the mean deviation of 
a normal variate from its mean is 

5. Prove that the mean value of the positive square root of a 

variate is r(l+\)r{l-\-m)jr{l) r{l + m+h). Hence, using 
the distribution of given in Ex. 1, for samples from an uncorre¬ 
lated bivariate normal population, show that the mean value of 
1 r 1 is r( J(» - l))/r(im) Vw. 

6. A simple sample of n values is drawn from a population with a 
y(Z) distribution. Show that, if ^ is the mean of the sample, nx is 
a y{nl) variate; and deduce that E(x) — I, and that the sampling 
variance of ac is Ijn, 

7. A simple sample of n values is drawn from a population with 
the exponential distribution whose probability density is ae“®®, 
{O^x). Show that, if x is the mean of the sample, nax is a y{n) 
variate; and deduce that E(x) = 1/a, and that the s.e. of ^ is l/(a-^n). 

8. Show that, if v is the square of a y(l) variate, its probability 

differential is dp = dvl2r(l), 

9. Show that, if (i = 1, ...,n) are n independent Gamma variates 
with parameters any homogeneous function F{x^y ,.,,x^) of 
these variates, of degree 0, is distributed independently of the sum 

(Cf. Pitman, 1937, 8, pp. 216-17.) 

i 

This may be done by considering the m.g.f. of the simultaneous 
distribution of F and 2 As explained in § 31 this m.g.f. 

t^) = ^{exp (<i 2 + h^)) 

/•oo rco 

a= C ... exp(— 

Jo Jo 

X exp 2 + ^2 

where 1/0 =» r(m {)... 


WMt 


ZI 



162 B and F Distributions [viii 

Substituting *<(1—= yj we have 

X exp (<jj F{yi,...,yj) #1... dy„, 
since F is homogeneous of degree zero. Thus 

M(ti, ^ 2 ) = (function of ti) (function of 
and the variates F and 2 are therefore independent. 

10. Given that the incomplete Beta function BJJt, m) is defined by 

m) = J a;^“^(l - dxy 

and that IJJi, m) = BJJi, m)IB{l, m), prove the relations 
4(Z, m)=l-/i^^(m, 1) 

and /a.(^+l, m^-\)-xlJJiym+l) + {\-x) /^.(Z+l, m). 

11. Simultaneous sampling distribution of the mean and the 
variance. The independence of the sampling distributions of the 
mean x and the variance S^, of a random sample of n values from a 
normal population, was proved by Fisher from, the simultaneous 
sampling distribution of these statistics. The probability that the 
n values of the sample will fall in the respective intervals dx^, ..., dx^^ 
is, by the theorem of compound probability, 

dp = (a- ^(27r ))'^exp -2 (^i■“/^)J dx^dx^,,, dx^, 

the n values being chosen independently from the normal population 
of mean p and variance cr^. But 

^(x^—pY = Y^(Xi—xY’\-n{x—/iY == nS^ + n(x-iiY, 

so that 


^ =» {o‘^J(2n))'-^ exp[- n(x-pYI2cr^] exp(- nS^I2(r^)dx^.,.dx^. 



Examples 168 

Fisher proved by geometrical reasoning* that this probability 
differential is expressible as 

dp ^ C exp [ ~ dx exp (— nS^j^cr'^) 

where C7 is a constant. Since this is of the form 

dp = ^^{x)dx,<f>2{S)d8f 

the distributions of x and 8 are independent. The forms of (f>i{x) 
and show that x is normally distributed, with mean p and 

variance cr^lUy and that {nS^jor^ is a Gamma variate with para* 
meter — 1). 

Another proof of the independence of x and 8^ will be given in § 77. 

12. Show that the rth moment of a distribution about 

a; = 0 is 

_Z (Z+l) .. .(Z+r~l) 

(Z-l-m) i)... (Z-fm + r— 1)* 

13. Show that, if r is less than m, the rth moment of a fi 2 {l,m) 
distribution about a; = 0 is 

+ r-l) 

(m--l)(m —2)... (m —r)’ 


14. Defining the harmonic mean (h.m.) of a variate x as the 
reciprocal of the expected value of Ijx show that, if Z> 1, the h.m. 
of a y(l) variate is Z — 1, that of a fi^{lym) variate is (Z— l)/(Z + m— 1), 
and that of a ^ 2 ^^ variate is (Z— l)/m, m being positive. 

♦ Fisher, 1925, 1, pp. 92-3. 


11-2 



CHAPTER IX 


CHI-SQUARE AND SOME APPLICATIONS 

74. Chi-square and its distribution 

We have already seen (§68, Theorem IV) that if (t =* 1,2,..., n) 
are n independent variates normally distributed about a common 
mean zero with standard deviations cr^, and 

( 1 ) 

i 

ihen is a Gamma variate with parameter \n. The distribution 
of is therefore 

( 2 ) 

which expresses the probability that the value of found from a 
random sample will fall in the interval This distribution is 
often referred to as the distribution, and a variate conforming to 
it is said to be distributed Uke Since is twice a y{\n) variate, 
its mean value is n and its modal value »-'2, as proved in §67. 
Similarly, the variance of ')^ is 2n. 

The distribution (2) was discovered by Helmert in 1875, and 
rediscovered independently in 1900 by Karl Pearson, who devised 
by means of it the test of 'goodness of fit’ which will soon be 
considered. We must first, however, examine the effect of one or 
more linear relations between the variates x^\ and, to make this 
clearer, we shall digress briefly to remind the reader of the pro¬ 
perties of an orthogonal Unear transformation. 

75. Orthogonal linear transformation 

Let the n variates be subjected to the linear transformation 

& = (i,i= (3) 

If the constant coefficients are such that 

(4) 



74,76] Orthogonal Transformation 165 

the transformation is said to be orthogonal. In this case the coeffi¬ 
cients satisfy the relations 

= ( 5 ) 

and = 0 = (j + fc). (6) 


These relations may be expressed verbally by saying that, in the 
determinant | | of the coefficients, the sum of the squares of the 

elements in any row or in any column is equal to unity, while the 
sum of the products of corresponding elements in any two rows or 
in any two columns is equal to zero. The determinant of the coeffi¬ 
cients is equal to ±1; and consequently the Jacobian of the J’s 
with respect to the x's is also ± 1. By changing the sign of one of 
the 5*8, if necessary, we may ensure that the value is + 1; and we 
shall assume in all cases that this has been done. It follows that 




(7) 


It is easy to prove that, if the rr’s are statistically independent 
variates, normally distributed about zero with unit s.d., so are 
the ^’s. For, the probability that simultaneously the n values of 
the variates Xf will fall in the respective intervals dx^ is the product 
of the probabilities for the individual variates, and is therefore 
given by 

dP = {27t)-^^ exp (— J 2 ^i) dx^dx^ ... dx^. 

i 

The probability density for the joint distribution of the variates 
x^ is therefore 

(27r)-‘» exp (- i S xj) = (27r)-»» exp (- J S ^)- 

< i 


But the ‘volume’ of the element bounded by the n pairs of hyper- 
surfaces h, is 


x„) 





166 Chi-Square [ix 

so that the probability that the will fall simultaneously in their 
respective intervals is expressible as 

dP = (2;r)-texp(-i£f)cZg,... ( 277 )-*exp(- 

From the form of this expression it follows that the g’s are statisti¬ 
cally independent, and are normally distributed about zero with 
unit s.D. 

The transformation (3) corresponds to a rotation of rectangular 
axes in Euclidean space of n dimensions. In the choice of the 
coefficients there is a degree of arbitrariness. For instance the 
coefficients for gj may be chosen first, subject only to the condition 

2c!i = i, 

i 

which ensures that the constants are the components of a unit 
vector. When g^ has been determined gg may be chosen in an infinity 
of ways, subject only to the conditions 

i i 

which express that C 2 i are components of a unit vector, orthogonal 
to the vector whose components are Cn. At each step there is freedom 
of choice of an axis orthogonal to those already chosen; and only 
in the case of the nWi axis is there no freedom of choice. 

76. Linear constraints. Degrees of freedom 

As above let the variates be normally distributed about zero 
as mean, with unit s.D. To begin with we assume that they are 
functionally independent; but presently we shall impose on them 
the linear restriction 

aia;i+a2u;2+.:.+a^a;„ = 0 , (8) 

an equation which may be divided throughout by the constant 
necessary to make S = i J we assume that this has been done. 

i 

If the variates x^ are independent, we may obtain by an orthogonal 
linear transformation a set of n statistically independent variates 
g^, each of which is normally distributed about zero as mean, with 
unit S.D.; and tjiis may be done so that gj is identical with the first 



76] Degrees of Freedom 167 

member of (8). Now let the condition (8) be imposed on the 
This is equivalent to putting = 0; and, since the variates 
are statistically independent of their distributions are unaltered 
by the condition so imposed. Consequently 

i 1 2 i-2 

where the are Gamma variates, each with parameter Thus 
is the sum of ti — 1 independent y(|) variates, and is therefore itself 
a Gamma variate with parameter ^(n—1). The distribution of 
is therefore obtained from (2) by putting n—1 in place of n. The 
condition (8) has reduced the number of independent variates by 
one. The number of independent variates is usually called the 
number oi degrees offreedom (D.F.),or briefly the number of freedoms. 
The term is borrowed from geometry and mechanics, where the 
position of a point or of a body is specified by a number of functionally 
independent variables called coordinates. Each independent co¬ 
ordinate corresponds to one degree of freedom of movement. Any 
constraint on the body reduces the number of degrees of freedom. 
For this reason a linear relation between the variates is called a 
linear constraint. We shall assume that only linear constraints are 
involved. 

An appeal to geometry throws light on the above reasoning. If 
the variables are regarded as rectangular Cartesian coordinates 
of the current point P in Euclidean space of n dimtnsions, the 
square of the distance of P from the origin 0 is 

OP 2 — 2 a:f = 

and, in terms of the alternative set of coordinates relative to 
rectangular axes through the same origin, 

When the variables are connected by the equation (8), the point P 
is constrained to lie on a hyperplane through the origin, determined 
by this equation. In terms of the ^’s the equation of the hyperplane 
is simply 


ii = o, 



[IX 


168 Chi-Square 

and for a point P on this hyperplane we have 

1 2 

as above. Thus the linear constraint ( 8 ) restricts the freedom of 
movement of P to the hyperplane, in which there are only n — 1 
independent coordinates. is then said to correspond to n — 1 d.f. 

That each linear constraint reduces the number of freedoms by 
unity may be shown in a similar manner. Thus if there is a second 
linear constraint 

6 ^ 0:1 + 62^2 + —+ ^n^n = W 

it is expressible in terms of the J’s in the form 

62£2 + ft353+-+^n^n = 0 . ( 10 ) 

n 

in which 2^? = !• We obtain as the sum of squares of n — 1 
2 

standard normal variates (i = 2 , which are now connected 
by the linear constraint ( 10 ). Proceeding as before by an appropriate 
orthogonal transformation 

n 

= S 

i-2 

in which is identical with the first member of ( 10 ), we may 
express as 

x* = s il, 

<-3 

and is thus a Gamma variate with parameter J( 7 i — 2 ), so that 
corresponds to n — 2 d.f. The argument holds for any number of 
linear constraints. Following Yule and Kendall* we shall denote 
the number of freedoms by v. Then, if the n variates are subject to 
m linear constraints, 

v — ( 11 ) 

In place of ( 2 ) we thus have for the distribution of corresponding 
to V D.F,, 

• , • 1937, 1, p. 416. 


(12) 



169 


77] Distribution of Variance 

Since is a yUv) variate, the mean value of x^ is v and its modal 
value is J/--2. The probability that the value of x^ from a random 
sample will not exceed a fixed value Xo is obtained by integrating 
(12) with respect to x^ from 0 to Xo* Similarly, the probability that 
X^ will exceed Xo is the integral* of (12) with respect to x^ from 
to 00 . The accompanying Table 3 gives the values of xh for values 
of V from 1 to 30, and for various fixed values of the probability P 
of exceeding xl- In other words, P is the probability that in random 
sampling the value of x^ shown in the body of the table will be 
exceeded. 

Example, Show that, for 2 d.f., tho probability P of a value of greater 
than Xo 1** ^P (— iXo)* hence that Xo = 2 log, 1/P. 

i77^,.^£fistribution of the ‘sum of squares’ for a random sample 
from a normal population 

Consider a random sample of n independent values from a 
normal population of variance cr^. If as usual x denotes the mean 
of the sample, the sum of squares of the deviations from the mean is 

(13) 

i 

This is the ‘sum of squares’ to be considered; and we shall prove 
that nS^jor^ is distributed like for n — 1 d.f. We know that 

ZA = + = nS^-\-nx^. (14) 

i i 

If then we introduce an orthogonal linear transformation (3) of the 
variables x^ sucht that = Xyjn, we have by (14) 

i i-l t-2 

and therefore nS^jer^ = 2 (I®) 

i.2 

Now, with origin at the mean of the population, the x’s are normally 
distributed about zero as mean with s.d. cr, and so also are the ^’s. 

,♦ For a method evaluating this integral see Fisher, 1935, 1, pp. 366-7. 
See also Ex. ix, 9. 

t Cf. Sawkins, 1940, 3, p. 225. . * 



170 Chi-Square [ix 

Consequently InS^fa^ is a Gamma variate with parameter ^(n — 1), 
and its distribution is 


dP = 


_1 _ 


r{|(n-l))\2<rV 




'n^\ 
20 * 2 /• 


In other words nS^/a^ is distributed like with ?i-1 d.p. • 

It is convenient to express the result in terms of the statistic 5 * 
of § 64, which is an unbiased estimate of the variance of the popula¬ 
tion. Thus 02 / 2 2 

nS^ = (n— 1)52 = vs^y 

and the distribution of is therefore 


the estimate being based on v d.p. 

The coefficient of ds^ in (16) is the probability density in the 
distribution of 8^. Suppose that the population variance is unknown, 
and that we enquire what value of it would make the probability 
density of a maximum for the given value of This is obtained 
by equating to zero its derivative with respect to The procedure 
leads to ar^ = 8^. For this reason 8^ is called the optimum value of 
the population variance corresponding to the given sample; and the 
method of obtaining it is called the method of maximum likelihood. 
It is due to R. A. Fisher. 

In the above argument is independent of there¬ 

fore, in virtue of (15), x is independent of 52 . Thus the sampling 
distributions of the mean and the variance are independent. And, 
since gi/or is a standard normal variate, so is Xylnjcr, It follows that 
X is normally distributed with the same mean as the population, 
and with variance cr^ln. 


78. Nature of the chi-square test. An illustration 
The l^est is a means of judging the credibility of an hypothesis 
concerning the population (or populations) from which the values 
of the sample (or samples) are drawn. The hypothesis to be tested 
must be of such a nature that we can determine, from the sample 
values of the variate, the corresponding value of a certain statistic, 
which is distributed like x^ for a known number of freedoms. The 



78] 


Chi-Square Table 


171 


Table 3. Values of ^ with probability P of 
being exceeded in random sampling 

V s= number of degrees of freedom 


X 

0*99 

0-95 

0*50 

0*30 

0*20 

0*10 

0-05 

0*01 

1 

00002 

0*004 

0-46 

1-07 

1-64 

2-71 

3-84 

6-64 

2 

0 020 

0-103 

1-39 

2-41 

3-22 

4-60 

6-99 

9-21 

3 

0115 

0-35 

2-37 

3-06 

4-64 

6 25 

7-82 * 

11-34 

4 

0-30 

0-71 

3 36 

4-88 

6-99 

7-78 

9-49 

13-28 

5 

0-55 

M 4 

4-35 

6-06 

7-29 

9-24 

11-07 

15-09 

6 

0-87 

1-64 

5-35 

7-23 

8-56 

10-64 

12-59 

16-81 

7 

1-24 

2-17 

6*35 

8-38 

9-80 

12-02 

14-07 

18*48 

8 

1-65 

2-73 

7-34 

9-52 

11-03 

13-36 

15-51 

20-09 

9 

209 

3 32 

8-34 

10-66 

12-24 

14-68 

16-92 

21-67 

10 

2-56 

3-94 

9*34 

11-78 

13-44 

15-99 

18-31 

23-21 

11 

305 

4-58 

10-34 

12-90 

14-63 

17-28 

19-68 

24*72 

12 

3-57 

6-23 

11 34 

14-01 

15-81 

18-55 

21-03 

26-22 

13 

411 

6-89 

12-34 

15 12 

16-98 

19-81 

22-36 

27-69 

14 

406 

6-57 

13-34 

16-22 

18-15 

21-06 

23-68 

29-14 

15 

* 5-23 

7-26 

14-34 

17-32 

19-31 

22-31 

25-00 

30-58 

16 

5-81 

7-96 

15-34 

18-42 

20-46 

23-54 

26-30 

32-00 

17 

6*41 

8-07 

10-34 

19-51 

21-62 

24-77 

27-59 

33-41 

18 

7-02 

9-39 

17-34 

20-60 

22-76 

25-99 

28-87 

34-80 

19 

7-63 

10-12 

18-34 

21-69 

23-90 

27-20 

30-14 

36 19 

20 

8-26 

10-85 

19-34 

22-78 

25-04 

28-41 

31-41 

37*57 

21 

8-90 

11-59 

20-34 

23-86 

26-17 

29-62 

32 67 

38-93 

22 

9-54 

12-34 

21-34 

24-94 

27-30 

30-81 

33-92 

40-29 

23 

10-20 

13-09 

22-34 

26-02 

28-43 

32-01 

35-17 

41-64 

24 

10-86 

13 85 

23-34 

27 10 

29-55 

33 20 

30-42 

42-98 

25 

11-52 

14 01 

24 34 

28-17 

30-68 

34 38 

37 65 

44 31 

26 

12-20 

15 38 

25 34 

29-25 

31-80 

35-56 

38-88 

45-64 

27 

12-88 

10 15 

20 34 

30-32 

32-91 

36 74 

40-11 

46 96 

28 

13-56 

16-93 

27-34 

31 39 

34-03 

37-92 i 

41 34 1 

48*28 

29 

14-26 

17-71 

28-34 

32-46 

35-14 

39-09 

42-56 1 

49-59 

30 

14-95 

18-49 

29-34 

33-53 

30-25 

40-26 

43-77 

50-89 


Reproduced by permission of the author. Professor R. A. Fisher, from his book 
on Statistical Methods for Research Workers, 


table then tells us the probability P that, in random sampling, a 
value of this statistic will occur greater than the value actually 
obtained. If this probability is very small, we regard the value 
obtained as significantly large, and conclude that the hypothesis is 
probably incorrect. 

Two conventional values of P are employed in deciding signi¬ 
ficance, viz. 0*05 and 0-01. These determine the 5 % and the 1 % 
levels of significance respectively. If the value ofi P obtained is 




172 


Chi-Square [ix 

greater than 0*05 we infer that, in more than 6 % of random samples 
from the population in question, the value of obtained would be 
greater than that actually found, which is therefore regarded as not 
significantly large. If, however, P is less than 0*05 the value found 
is regarded as significant at that level. Similar remarks apply to the 
1 % level of significance. Values which are significant at the 1 % 
level of probability are said to be highly significant, and are some¬ 
times distinguished by a double asterisk. Significance at the 5 % 
level is then denoted by a single asterisk. 

The value of obtained from the sample may also be signi¬ 
ficantly small. A glance at the probability curve of will help to 
make this point clear. When the number v of degrees of freedom is 



greater than 2, this curve has the form indicated in the diagram, 
except that when j' = 3 it touches the y-axis at the origin, and when 
1 / 4 it touches neither axis at that point. The ordinate for any 

value of is the probability density for that value; and this is small 
for small values of When the value of P found from the table is 
greater than 0*95, the probability of obtaining a smaller value of 
is less than 6 %, and the sample value must be regarded as signi¬ 
ficantly small. Similarly, when P is greater than 0-99 the probability 
of a smaller value of is less than 1 %, and the smallness of the 
sample value is highly significant. 

As an illustration of the test we may consider the following. 
Let be a random sample of n values from a normal population 
of variance (r^, x the sample mean and 

, ^ nS* =» jc)*. 



79] Goodness of Fit 178 

Then we know that nS^lcr^ is distributed like ^-1 

If then we are given the sample values a;^, but have no certain know¬ 
ledge of the population from which they were drawn, we may test 
the hypothesis that they came from a normal population of s.d, o*. 
For, when has been found from the sample, our hypothesis gives 
nS^/cr^ as the sample value of a-nd the table then enables us to 
decide whether this value is significant or not, that is to say whether 
our hypothesis is improbable or not. 

Example, A random sample of 12 values gave an unbiased estimate a* of 
the population variance equal to 10-62 mrn.^ May the sample be reasonably 
regarded as from a normal population with variance 7 inrn.*? 

Here n*S>* = 11 x 10-62 = 116*82, and, according to our hypothesis con¬ 
cerning the population, 

;^*= 116*82/7 = 16-7. 

The number of d.f. is 11. From the table we see that the probability, P, that 
the value of x* in such samples will exceed 16-7 is greater than 0-10. The 
value found is therefore not signiticant, and the test provides no evidence 
against the hypothesis of a normal population with variance 7. 

79. Test of goodness of fit 

We shall now consider the test of goodness of fit devised by 
Karl Pearson. As in Chapter vii let the population be one whose 
members may be separated into a number Ic of classes, and let 
be the relative frequency of the ith class. In the choice of a simple 
sample of n members from this population, the probability at any 
drawing that the individual selected will belong to the ith class is 
The frequency of this class in sampling has a mean value 
given by 

m< = npiy (17) 

and the distribution ofis the binomial. For large values of n the 
binomial distribution approximates to normal; and the sampling 
distribution of any class frequency is therefore approximately 
normal for large samples. 

Next suppose that we are given a set of class frequencies /<, 
whose sum is n, without any information about the source of the 
values. We may wish to enquire whether they may reasonably be 
regarded as those of a simple sample from a certain hypothetical 
population. The population is hypothetical in the sense that it is 



174 Chi-Square [ix 

determined by an hypothesis, which must be such as to enable us to 
calculate the expected values of the class frequencies in random 
sampling from it. We shall prove that, for large samples, the variate 

S conforms approximately to the distribution, and 

i 

shall show how the number of degrees of freedom is determined. 
The value of obtained from the sample enables us to test the 
credibility of our hypothesis. 

Since, for large samples, the class frequency is distributed 
approximately normally about as mean, the variate 

(18) 

is distributed approximately normally about zero as mean. We 
require the variance (x\ of when the class frequencies are entirely 
independent and, to obtain this, we must take account* of the 
variation in their sum n. Thus 


• ( 19 ) 

and the size n of the sample varies about a mean n with s.d. cr^. 
Since the frequencies /< are supposed independent, the variance of 


n is given by 


ot = So-?- 


( 20 ) 


Now, in virtue of §47 (8), the variance (r\ for samples of varying 
Summing for all the classes we have, by (20), 

cr* =»Sl’i<7<+o^SP?. 

i i 

or, since =* 1> 

Hence = n, so that (21) is equivalent to 

= {Pi^i +Pi) n = Pin = m^. ( 22 ) 

The quantity defined by 

= (23) 

is a sum of squares of standard approximately normal deviates, 
and is therefore distributed approximately like x^- 
t , ♦ Cf. Fisher, 1922, 1, p. 88. 



175 


80] Goodness of Fit 

For samples of fixed size, the number of d.p. is clearly less than 
the number k of classes. For the sum of the class frequencies is 
constant, and this corresponds to a linear constraint on the variates 
Further, to determine the theoretical class frequencies it 
is sometimes necessary to estimate parameters of the population 
from the data of the sample. For instance, in testing the hypothesis 
of a normal population, it may be necessary to estimate the mean 
and the variance of the population from the sample values. Each 
estimate of a parameter obtained in this manner corresponds to the 
introduction of a linear constraint. For the moments about the 
mean are linear, or approximately linear, functions of the class 
frequencies; and, in equating such a function to a parameter, we 
introduce an approximately linear constraint on the variates x^. 
In calculating the number of freedoms of x^> each constraint intro¬ 
duced in this manner must be recognized. 

The approximation of the binomial distribution to normal, when 
n is large, does not hold for very small values of p or q. Hence the 
above argument is not valid if one of the class frequencies is small; 
for that would make the corresponding relative frequency fin very 
small, and therefore the probability p for that class in sampling 
from the population also very small. Classes of small frequency 
may be treated by combining two or more of them to form a class 
sufficiently large. 


80. Numerical examples 


Example 1. Can tho wages of 1,000 employees, given in Ex. I, 3 (p. 17), be 
regarded as a random sami)lo from a normal population? ^ 

Our hypothesis is that the population is normal. Since the sample is large 
its mean and its variance are taken as estimates of those of tho population. 
In Ex. Ill, 6 (p. 60) the class frequencies per thousand of the normal population 
are given. On account of the smallness of the extreme frequencies we combine 
the first two classes, and also the last two, leaving 13 classes. Since the sum 
of the class frequencies is constant, and two of the parameters were estimated 
from the sample, the number of degrees of freedom is 10. The value of ^ 
frgm the sample is 


6 * 10 * 


14* 


Y* = —H-1-0 + — + ...+ 

^ 18 26 ^ 


79 


(3-2)^ 

is 


19*84. 


From the table we see that, for 10 d.f., this value of y® is significant at the 
6 % level. We conclude that the assumption of a normal population is 
probably incorrect. , • 



176 Chi-Square [ix 

Example 2. From the adult male populations of seven large cities, random 
samples of the sizes indicated below were taken, and the numbers of married 
and single men recorded. Do the data indicate any significant variation 
among the cities in the tendency of men to marry T 


City... 

A 

B 

C 

D 

E 

F 

G 

Total 

Married 

133 

164 

165 

106 

153 

123 

146 

980 

Single 

36 

67 

40 

37 

65 

33 

36 

294 

Total 

169 

221 

196 

143 

208 

166 

182 

1274 


We test the hypothesis that there is no significant variation in the ten¬ 
dency mentioned. Then the men from each city may be regarded as a simple 
sample from a population in which the ratio of married men to single is 
approximately the same as in the column of totals. This ratio is 10:3. The 
theoretical frequencies for any city are then obtained by dividing the total 
for that city into two parts in this ratio. These frequencies are: 


City... 

A 

B 

C 

D 

E 

F G 

Total 

Married 

130 

170 

150 

110 

160 

120 140 

980 

Single 

39 

61 

45 

- 33 

48. 

36 42 

^ 294 

^ Total 

169 

221 

195 

143 

208 

166 182 

1274 

From these figures we have 





s 



3* 

130 

6* 

6* 

iso'*’’ 

6* 

••^42 

= 6*34. 

'v' 4 


I 




To find the number of freedoms we observe that the sum of the frequencies 
of married and single men from any city is constant, being equal to the size 
of the sample from that city. This reduces the number of independent fre¬ 
quencies to 7. And further, a parameter of the population was estimated from 
the sample, namely, the ratio of the numbers of married and single men. 
Consequently v = 6. For this number of freedoms the probability of obtaining 
a larger value of than 6'34 is about 0*60. The value is therefore not signi¬ 
ficant, and the test furnishes no evidence against the hypothesis. 


J Example 3. May the data in Ex. Ill, 7 (p. 61) be regarded as those of a 
fipXLdom sample from a Poissonian distribution T 

I The mean, w = 1*2, was estimated from the sample, and from it the 
theoretical frequencies per thousand of the Poissonian distribution were 
calculated in the example referred to. To apply the test we combine the 
last three classes. Then 


. 3*8® (3-6)* 

y* = -^ + - 

^ 301*2 361-4 


(4-4)« 

7*6 


= 3*6. 


Here the number of classes is 6; but the total frequency is constant, and m 
was estimated from the sample, so that v ^ 4, With 4 o.f. the probability 
that X* exceed 3*6 is nearly 0*60.<rhe value is therefore not at all signi¬ 
ficant, and the assumption of a Poissonian population is not discredited. 



81,82] Additive Property 177 

81. Additive property of chi-square 

Theorem I. If the independent variates x and y conform to the 
distribution y with and d.f. respectively , then x-\-y is distributed 

like with ® 

For ^x and ^y are independent Gamma variates with parameters 
and Jj'a respectively. Therefore, by §68, Theorem II, \(x + y)\s 
a Gamma variate with parameter + Consequently x+y 
conforms to the distribution with V 1 + V 2 

Similarly, corresponding to § 68, Theorem III, we may state 

Theorem II. If the sum of two independent positive variates is 
distributed like x^ with + 1^2 ® them is distributed like 

X^ with D.F., then the other is distributed like x^ with D-®*- 

In the same way Theorem V of § 69 and Theorem VIII of § 73 
may be expressed as 

Theorem III. If the independent variates x and y are distributed 
like x^ with and 1^2 respectivelyy then xl{x + y) is a Beta variate 
of the first kind with parameters and while xjy is a Beta variate 
of the second kind with the same parameters, 

82. Samples from an uncorrelated bivariate normal population. 
Distribution of the correlation coefficient 

The distribution of the coefficient of correlation, r, in samples 
from a normally correlated bivariate population was given by 
Fisher* in 1915. In the particular case of an uncorrelated popula¬ 
tion (p = 0) the distribution of r is very simple. Let and (Tg be 
the s.D.’s of a; and y in the population, and let the variates be mea¬ 
sured from their means. Consider a random sample of n pairs of 
values Xiy y^y from such a population. The variances Sly SI of x, y 
in the sample are given by 

nSl = S (a:, - *)*, = 2 (j/, - y)\ 

t t 

and the correlation, r, in the sample is 

'L{xt-x)(yi-y) 

^ ~ nSiS^ 

* Fisher, 1916, 2. A simple and excellent proof of a different character 
has recently been published by Sawkins (1944, 1). The proof for /> = 0 given 
in this action is based on that of Sawkins. « • 

19 


WMS 



178 Chi-Square [ix 

Owing to the nature of the sampling the n values are indepen¬ 
dent. Let these be subjected to an orthogonal transformation 
yielding n variates the first of which may be taken as 

Vi = ^'Lyi=‘^ny, 

since the sum of squares of the coefficients of the is unity. Then 

I.Vt = Sy* = E = nSl+ijl, 

1 1 1 


so that nSl = 2 vh (i) 

2 

and this sum, divided by or|, is distributed like with n — 1 d.f. 

Further, from the above definition of r, 

VwrS, = 'Z{xt-x){y,-y)l^lnSi = I.(x,-x)y,l-JnSi. 


We may take this sum for the second variate, for the sum of 
the squares of the coefficients of the y^ is unity, the orthogonal 
condition is satisfied by the coefficients of rjy^ and and the 
variables x and y are independent. Since 7}% = nr^S\ it follows from 


(i) that = n(\—r'^)S\, Further, the TjJcr^ are independent 

3 

standard normal variates. Thus nr^Sycrl and ri(l — r^) <S|/(r| are 
distributed independently like with 1 and n-2 d.f. re¬ 
spectively; or we may express it by saying that the former is a 
Xi and the latter a Xn- 2 * follows that 

nr^Sl nr^Syal _ Ali 

nSl nr^Syal + n{l-r»)Sl/(rl Xi+A'®-*’ 
and therefore, by §81, Theorem III, is a P\{\y\[n-2)) variate, 

(r2)Hl(l_y2)i(n-4)^(y.2) 


with distribution 


dp - 




(24) 


We may thus state the important 
Theorem IV. For random samples of n pairs of values from an 
uncorrelated bivariate normal population, is a Beta variate of the 
first kind mth parameters \ and \{n — 2). 

The expected value of r* for such samples is therefore l/(n— 1), 
and the s.b. of r^is 1). And since r ranges from -1 to 1, the 



88] Distributions of r and b 179 


opposite values ±r giving the same value of r*, the distribution 
of r is 


B(li(n-2))- 


(25) 


Ih considering the linear regression of y on a; we proved the 
relation (cf. §27 (31)) 

= (26) 

where Y is the estimate of y from the regression equation. In the 
present notation this corresponds to the identity 

nSl = n(l -r2) Sl + nr^Sl (27) 

We have just seen that, for samples from the above population, 
these three sums are independent. Thus the sum of squares of 
deviations from the line of regression is distributed independently 
of tho sum of squares of deviations due to regression in the sample. 
When divided by (r| the three sums in (27) are distributed like 
with 71—1, 71 — 2 and 1 d.f. respectively, illustrating the additive 
property of stated in § 81. 


83. Distribution of regression coefficients and correlation ratios 

The distribution of the linear regression coefficient^ 6, of y on a; 
follows from the above results. For 


so that 


6 = rSJS^ 

h^o-\ _ nr^S\j(T\ 
a\ nS\lcr\ 


As we have just seen, the numerator and denominator of the second 
member are distributed independently like with 1 and ti — 1 d.f. 
respectively. Consequently, by Theorem III, the quotient is a 
A( “ 1)) variate, and therefore 6V|/o'| is a variate of the same 
type. Hence we may state 

Theorem V. In random samples of n pairs of values from an 
uncorrelated bivariate normal population^ in which the standard 
deviations of the variables are Oi and or 2 , the linear regression coeffifiient 


X2>2 



180 Chi-Square [ix 

bof yon X is such that b^alJal is a Beta variate of the second kind mth 
parameters i and — 1). 

Since the mean value of a variate is Z/(m — 1), the expected 

value of 6* in sampling is (r|/(n—3)orf. And, since the range of 6 is 
from —00 to + 00 , the distribution of b is 




(28) 


The distribution of the correlation ratiOy of on a? may be 
determined from the corresponding resolution of the sum of squares. 
With the notation of § 34 let the subscript i distinguish the particular 
array of i/’s, while j indicates position in that array, denoting the 
mean of the y'e in the ith vertical array, and the frequency in 
that array. Then, in virtue of § 34, we have the resolution 

2S(y«-!/)*='' (29) 

i i i i i 

the various sums being equal to the corresponding terms of the 
identity 

nSl = n(l Sl + nrj^Sl (30) 


Now the first member of (29), divided by cr|, is distributed like 

with n— 1 D.F. Further, since 2 denotes summation over the values 

i 

of any particular array, is distributed like with 

— 1 D.F. Summing for all the h arrays we see that the first sum in 
the second member of (29), divided by cr|, is a ^ with 2 — 1) or 

n —A D.F. And, because the means of the arrays are distributed 
independently of their variances, it follows from §81, Theorem II, 
that the last sum in (29), divided by cr|, is a with A -1 d.f.* We 
may therefore write 




* See also Ez. 7 at the end of this chapter. 



181 


88] 


Correlation Ratio 


in which the subscript indicates the number of d,f. for the 
Thus, in virtue of §81, Theorem III, is a 
variate, and we may state 

Theorem VI. For random samples of n pairs of values from an 
uncorrelated bivariate normal population, the square of the correlaiion 
ratio of y on X is a Beta variate of the first kind with parameters 
1) and i(n~A), where h is the number of arrays of y*s. 

The distribution of is thus 


dp = 


B(Uh-l),i{n--h)) • 


(31) 


and its mean value,’*' by § 69 (23), is 

W = (32) 


COLLATERAL READING 

Yule and Kendall, 1937, 1, chapter xxn. 

Mills, 1938, 1, pp. 618-36. 

Fisher, 1938, 2, chapter iv. 

Aitken, 1939, 1, pp. 99-105. 

Rider, 1939, 6, chapter vii. 

Kenney, 1939, 3, part ii, pp. 164-7L 
Tippett, 1931, 2, chapter iv. 

Kendall, 1943, 2, chapter xn. 

Snedecor, 1938, 3, chapter ix. 

Goulden, 1939, 2, chapters ix and x. 

Fisher, 1922, 1 and 1935, 1, pp. 353-4, and 356-8. 
Pearson, K., 1900, 1 and 1922, 3. 

Sawkins, 1940, 3, 1941, 1 and 1944, 1. 


EXAMPLES IX 

1 . Apply the show that the deaths of centenarians 

recorded in the example of § 18 may reasonably be regarded as a 
random sample from a Poissonian population. 

2. A certain hypothesis was tested by two similar experiments, 

which gave for p = 9 and x^ — 14-9 for v = 11. Show that 

the two experiments combined give less reason for confidence in the 
hypothesis than either experiment alone. 

• Cf. Fisher, 1922, 2, pp. 604-6. 



182 


Chi-Square [ix 

3. Prove by mathematical induction that the sum of the squares 
of n independent standard normal variates conforms to the 
distribution (2) of § 74. (See Fisher, 1935, 1, pp. 353-4.) 

4. Geometrical proof of the distribution. The following geo¬ 
metrical proof, that the sum of the squares of n independent stan¬ 
dard normal variates x^ has a distribution given by (2) of §74, is 
due to Fisher (1935, 1, p. 354). As in §74 the probability that the 
values of the variates will fall simultaneously in the respective 
intervals dx^ is 

dp = (2;r)-*" exp (- ix®) — (i) 

where A!* = S A- 

i 

Let us regard the x^ as coordinates of the current point P in Euclidean 
space of n dimensions. Then dp is the probability that P will fall 
in the element of volume dx^dx 2 .,.dx^. The coefficient of this 
element of volume in (i) is therefore the probability density ^r this 
space, and it is proportional to exp (— Ix^), Now we can express, in 
terms of x dXt an element of volume in which the value of x 
may be regarded as constant. For, since x^ is the square of the 
distance of P from the origin, x is constant over the surface of a 
hypersphere with radius x s-nd centre at the origin. The volume 
enclosed by this hypersphere is proportional to x^’y snd the element 
of volume between this and the adjacent hypersphere of radius 
X + dxis proportional to d{x^)y that is to Using the above 

value of the probability density, we see that the probability that 
P will fall in the region bounded by the two hyperspheres is pro¬ 
portional to x^~~^ ^^X^^ s-nd this is the probability that the 

value of X tvom the random sample will fall in the interval dy* The 
above probability is clearly proportional to 

(k®)““"®’exp (- ix®) d{lx‘). 

and, since x^ inust lie between 0 and -f oo, the constant factor is 
\ir(\n)y so that the integral of the probability throughout this 
range may be unity. Since x^ interval dy^ when y hes in 

the interval dy, it follows that the distribution of is given by 
§ 74 ^ 2 ). 



Examples 188 

For still another proof of the distribution see Plummer, 1940, 
1, p. 246, 

5. Homogeneity of several estimates of the population variance. 

Suppose that k independent samples have furnished estimates $\ of 
the population variance, based on {i = d.f. Are these 

estimates such that the samples may be regarded as drawn from 
the same population? In other words, are the estimates homo- 
geneous't 

On the hypothesis that the samples are from the same population, 
an unbiased estimate s^ of the variance of the population is 

V i i 

since the expected value of this quantity is the variance of the 
population, in virtue of §54(8). Bartlett* has shown that the 
statistic 

where G = l + „,/ .. (s---), 

Vi v}' 

is distributed approximately as with h—l d.f. The value of 
calculated from the data will tell whether the hypothesis of homo¬ 
geneity is reasonable. 

6. Show that the estimates 3*8, 4*4, 8-1, 6*1, and 9*4 of the 
population variance, based on 5, 8, 6, 7 and 4 d.f. respectively, may 
be regarded as homogeneous according to the test of Ex. 5 (A; = 6, 

= 1.45). 

7. That the sum of §83 is distributed like 

i 

with ^ — 1 D.F. may also be shown as follows. Since y is the weighted 
mean of the y^ we have 

== (ii) 

Now yi—fi* is distributed normally about zero as mean with 
variance (xyn^. Consequently nf,yi—ji'Yjal is a with 1 d.f.; and, 

* Proc, Roy. Soc. A, vol. 160, 1937, pp. 268-82. See also Neyman and 
Pearson, 1931, 3. • * 



184 Chi-Square [ix 

on summing for all the arrays, we see that the first member of (ii) 
divided by <t\ is a with h d.p. Similarly, y—ji* is distributed 
normally about zero as mean with variance crl/n; and the last term 
in (ii), divided by o-f, is therefore a with 1 d.f. Theorem II of § 81 
then shows that the first sum on the right of (ii), divided by crj, is 
a with A -1 D.F., as stated. 

8 . Taking the correlation ratio as positive, deduce from § 83 (31) 
that the mean value of the correlation ratio of y on x, for samples 
from an uncorrelated bivariate normal population, is 

r(iA)r(j(n-i))/r(j(A-i))r(in). 

9. Integrating by parts show that, if r is an even positive integer, 

and, by continuing the process, show that the value of the integral is 

Hence show that the probability, in random sampling, that the 
value of X* with v (even) d.f. will exceed is obtained from this 
expression by putting ^ and r = — (Cf. Fisher, 1935, 

1, pp. 356-7.) 

10 . The variables y are normally correlated with coefScient />. 
Show that u and v, defined by 

u = xjcr^ + yl<r^, v = xlcr^ - y/cTa, 

are independent normal variateswith variances 2(1 +/)) and 2(1 — p) 
respectively; and that, if B is the correlation between u and v in 
random samples of n pairs from the bivariate population, B^ is a, 
variate. 



CHAPTER X 


FURTHER TESTS OF SIGNIFICANCE. 
SMALL SAMPLES 


84. Small samples 

The use of the standard errors of statistics in the tests of Chapters 
VI and VII depends upon the fact that, in the case of large samples, 
the sampling distributions of many statistics are approximately 
normal, or at any rate unimodal with the property that a value of 
the statistic, deviating from its mean by more than two or three 
times its s.e., is very unlikely. For small samples, however, the 
distributions of statistics are often far from normal. Moreover, 
the estimate of a parameter of the population made from a small 
sample is not at all reliable. For these reasons the use of standard 
errors in connection with such samples is very limited. The chief 
concern of the theory of small samples is with the distributions of 
various statistics, and the applications of tests of significance based 
upon these distributions. In each application we test a hypothesis 
concerning the source of the sample. This hypothesis may be, for 
instance, that a certain parameter of the population has a specified 
value, or that two given samples were drawn from the same popula¬ 
tion. In each instance the test employed enables us to form a con¬ 
clusion based on considerations of probability. The nature of the 
tests is illustrated by the test of §78, which is applicable to 
samples of any size. But, as already indicated, the test of goodness 
of fit considered in § 79 can be applied only to large samples. 

We remind the reader of two important results obtained earlier. 
First, in simple sampling from a population with mean /i and 
variance o’*, the distribution of the mean x of the sample of n 
members has fi for its mean and cr^jn for its variance (see § 60). 
This result holds whether the sample is large or small, and whether 
the population is normal or not. In the case of a normal population 
the distribution of x is also normal (see §§ 22 or 82). Secondly, the 
statistic 8^ defined by n 

(n-l)«*« S (a;<-»)* 


(1) 



186 


Tests of Significance [x 

is an unbiased estimate of the population variance a**, in the sense 
that E(d^) = O’® (cf. § 54 (8)). This estimate of cr® is said to be based 
on n—1 D.F., since the variates — x are not independent, being 
connected by the linear relation The number of 

independent variates is thus only n — 1; and this agrees with the 
result of § 77, that {n-‘l)8^lcr^ is distributed like with n — 1 d.f. 

The following discussion applies to samples of any size. It is 
assumed, however, that the populations from which the samples 
are drawn are normal. The results obtained are therefore strictly 
true only in this case. But they are approximately true, and may be 
usefully applied, in most cases in which the departure of the popula¬ 
tion from normality is not very marked. 


‘Student’s’ Distribution 


85. Tlie statistic t and its distribution 


We have seen that, in simple sampling from a normal population 
of mean p and variance cr^, the deviation x—is a normal variate, 
with mean zero and s.d. or/-^n. The quotient of x — fi by al^n is 
therefore distributed normally with unit s.d. If, however, in place 
of the constant a we use the variable estimate s obtained from the 
sample, we have the statistic 




( 2 ) 


which is not normally distributed. The distribution of t was first 
found by W. S. Cosset, who wrote under the nom de plume of 
‘Student*. It follows very simply from the theorems of §§67, 73 
and 77. For, from its definition, 

where nS^ = S a,nd v is the number of d.p., n — 1, on which 

the estimate s^ is based. Hence 



85] Distribution of t 187 

But, in virtue of §67 (Theorem I) and §77, the numerator in the 
second member is a Gamma variate with parameter J, and the 
denominator a Gamma variate with parameter and these are 
distributed independently of each other (cf. §82 or Ex. VIII, 11). 
Consequently is a variate; and its distribution is 

obtained by substituting for x in §72(27), with l = i and 
m = \v. Thus the probability that, in random sampling, the value 
of will fall in the interval dt^ is 


.p_ 


(3) 


By integrating with respect to from a fixed value to infinity,* 
we obtain the probability that the value of will exceed The 
accompanying table contains extracts from a more complete one 
given by Fisher. In the body of the table are given the values of 
corresponding to certain fixed values of p and P. Thus, for a specified 
number v of d.f., the value of P at the top of the column is the 
probability that, in random sampling, the numerical value of t in 
the body of the table will be exceeded. 

The range of is from 0 to oo, and its distribution is given by (3). 
The statistic t, however, ranges from — cxd to + cx), and its distribution 
is therefore 


the factor 2 disappearing, since the integral of dP over the whole 
range of variation of t must be unity. The distribution (4) is spoken 
of as the i distribution corresponding to d.f. And from the above 
argument it is clear that any statistic whose range is from — cx) to 
+ 00 , and which is such that Pfv is a variate, conforms to 

the t distribution for v d.f. This important result may also be 
expressed in the form of 

Theorem I. A statistic t conforms to the t distribution for if its 

range is from —oo to + oo, and Pjv is exprefisible as the quotient of two 
independent variates, which are distributed like ^ with I and v d.f. 
respectively. 


* For a method of evaluating this integral see Fisher, 1^36,1, pp. 36^-60. 



188 Table oft 




Table 4. Values of mod, t with a probability P 
of being exceeded in random sampling 

V SB number of degrees of freedom 


X ^ 

V 

0-60 

0-10 

0-05 

0-02 

0-01 

1 

1-000 

6-34 

12-71 

31-82 

63-66 

2 

0-816 

2-92 

4-30 

6 96 

9-92 

3 

0-765 

2-35 

3-18 

4-54 

5-84 

4 

0-741 

2-13 

2-78 

3 75 

4-60 

5 

0-727 

2-02 

2-57 

w3-36 

4-03 

6 

0-718 

1-94 

2-45 

3-14 

3-71 

7 

0-711 

1-90 

2-36 

3-00 

3-50 

8 

0-706i 

1-86 

2 31 

2-90 

3-36 

9 

0-703 

1-83 

2-26 J 

2-82 

3>25 

10 

0-700 

1-81 

2-23 

2-76 

3-17 

11 

0-697 

1-80 

2-20 

2-72 

3-11 

12 

0 695 

1-78 

2-18 

2-68 

3 06 

13 

0-694 

1-77 

2 16 

2-65 

3-01 

14 

0-692 

1-76 

2-14 

2-62 

2-98 

15 

0-691 

1 75 

2-13 

2-60 

2-95 • 

16 

0-690 

1 75 

2 12 

2-58 

2-92 

17 

0-689 

1-74 

2 11 

2 57 

2-90 

18 

0-688 

1-73 

2-10 

2 55 

2-88 

19 

0-688 

1 73 

2-09 

2-54 

2-86 

20 

0-687 

1-72 

2-09 

2-53 

2-84 

21 

0-686 

1-72 

2-08 

2-52 

2 83 

22 

0-686 

1-72 

2-07 

2-51 

2 82 

23 

0-685 

1 71 

2-07 

2-50 

2 8L 

24 

0-685 

1-71 . 

2-06 

2-49 

2-80 

25 

0-684 

l-71v^ 

206 

2-48 

2-79 

26 

0-684 

1-71 

2-06 

2-48 

2-78 

27 

0-684 

1-70 

2-05 

2-47 

2-77 

28 

0-683 

1-70 

2-05 

2-47 

2-76 

29 

0-683 

1-70 

2-04 

2-46 

2-76 

30 

0-683 

1-70 

2-04 

2-46 

2-75 

35 

0-682 

1-69 

2-03 

2-44 

2-72 

40 

0 681 

1-68 

2-02 

2-42 

2-71 

45 

0-680 

1-68 

2-02 

2 41 

2 69 

50 

0-679 

1-68 

2-01 

2-40 

2-68 

60 

0-678 

1-67 

2-00 

2-39 

2-66 

00 

0-674 

1-64 

1-96 

2-33 

2-68 


Reproduced by permission of the author. Professor R. A. Fisher, from his book 
on Statistical Methods for Research Workers, 

The reader should sketch the probability curves for and t. 
The former is asymptotic to both axes, with ordinate which decreases 
continuously as increases. The probability curve of t is symmetrical 
about the line t » 0. It is asymptotic to the ^axis at each end, and 




86,87] Test for Population Mean 189 

the maximum ordinate is that for t = 0. Thus small values of are 
more likely than larger values. In this respect dili'ers from 
when the latter has more than 2 d.f« 


86. Test for an assumed population mean 

The statistic t and the above table provide a means of testing an 
assumed value ji for the mean of the normal population from which 
the random sample was drawn. On the hypothesis that /i is the 
true value, the equation (2) and the values of x and s derived from 
the sample enable us to calculate t. The table then gives the prob¬ 
ability P that this value will be exceeded numerically in random 
sampling from a normal population with mean /t. If P is less than 
0*05 we regard our value of t as significant. If P is less than 0*01 
we regard it as highly significant. A significant value of t throws 
doubt on the truth of the hypothesis that fi is the mean of the 
population. 


Example. A random sample of nine from the men of a large city gave a 
mean height of 68 in.; and the unbiased estimate 8^ of the population variance 
found from the sample was 4*6 in.* Are those data consistent with the 
assumption of a mean height of 68‘5 in. for the men of the city? 

Large [copulations of heights of men are known to be approximately 
normal. For the given sample 


a \ >- 


s = 680, I>=r9-1 = 8, s = V4'5 = 2-12, 

and therefore, on the hypothesis that /i = 68-5, we have 

|«1 = I !>/« = (0*6) X 3/2-12 = 0-707. 

From the table we find that, for v = 8, the probability that this value of t 
will be exceeded numerically in random sampling is about 0-60. The value 
is therefore not at all signiMeant, and the test provides no evidence against 
the assumption of a population mean of 68-5 in. 

If we make the assumption that ji = 69-5 or 66*5 in. we obtain a value of 
1 1 1 three times as great as before, viz. 2-12. The probability that, for v = 8, 
this value will be exceeded in random sampling is greater than 0-05, and the 
new value of I is still not significant. There are thus fairly wide limits for the 
assumed population moan which, with the data of the sample,, will provide 
a value of t which is not significant. These limits we proceed to consider. 


87. Fiducial limits* for the population mean 

Suppose that a certain sample from a normal population has a 
mean and provides an unbiased estimate of the population 

* Of. Fisher, 1930, 3 and 1033, 1. 



190 Tests of Significance [x 

variance based on v d.f. We wish to find limits to the assumed 
population mean /i so that, with the data of the sample, it will lead 
to a value of t that is not significant. If our choice is the 6 % level 
of significance, we define the 95 % confidence range for fi as that 
range of values of ji which, with the data of the sample, will furnish 
a value of | ^ | less than the value which corresponds to P = 0-05. 
This requires 

\x-ii\^njs<t^, 

so that X - stjyjn</i<x+stj^ln. (5) 

Consequently /t must lie within the range extending from x—stj^n 
to 5 -f- stj^n, which is called the 95 % confidence range for fi corre¬ 
sponding to the given sample; and the bounding values of this range 
are the corresponding confidence limits, or fiducial limits, for the 
mean of the population. Similarly we have confidence ranges and 
limits corresponding to other levels of significance. In each case the 
appropriate value of is found from the table of t. 

Example 1. Find the 96 % fiducial limits for [i corresponding to the sample 
in the example of § 86. 

Here v = 8, = 2*31, s = 2*12, n = 9. Hence 

stjyfn = (2a2)(2-3l)/3 = 1-63. 

The required limits are therefore 68 ± 1*63, that is 66-37 and 69*63 in. 

Example 2. Find the 98 % fiducial limits for p corresponding to the same 
sample. 

From the table we find that = 2*90 is the value of t which is exceeded 
numerically with a probability of 2 %. Then 

8tJ^ = (2-12)(2*90)/3 = 2*06, 

and the required limits are 68 ± 2*05, that is 65*96 and 70*06 in. 

88. Comparison of the means 6f two samples 
Given two independent samples of and members, wdth means 
Xi and ^2 respectively, we may use the i distribution to decide 
whether the means differ significantly, or whether the two samples 
may be regarded as drawn from the same normal population,* We 
test the hypothesis that they are from the same normal population. 


• Cf. Fisher, 1926, 1, pp. 90-8. 



88] Comparison of Means 191 

Leta;^ (i = 1, be the values of the variable in the first sample, 
and Xj(j=: 1,n 2 ) those in the second. Then the sums of squares 
for the two samples are 

^1 = 2 {^i - ^2 /Sf| = S (x'j - 

i 3 

respectively. Also, if cr^ is the variance of the population, Sllcr^ 
and n 2 S\j(T^ are distributed like with 1 and d.f. 

respectively. Consequently {n^SX-k-n^SDja^ is a ^ with v d.f., 
where 

V = % + n2~2. 

An unbiased estimate of the population variance, obtained from the 
samples, is 

8^ = {n^Sl-^-n^SDjv, 

since 

E{vs^) = E{n^S\-\‘n2Sl) = (ni-l)cr2 + (n2~l)a^ = 

so that* E{s^) = (X^. 

Now, in virtue of the hypothesis, x^ and normally dis¬ 

tributed about the population mean with variances cr^lrii and 
(T^/rig respectively. Therefore, since the samj)les are independent, the 
difference Xi — ^2 is normally distributed about zero with variance 
cr2(l/%+ 1 /W 2 )* ^ we define a statistic t by the equation 


we have 


1 3- -*^2 

5! = (^i~^2 )Vo'^(1/^i + 1 M2) 

V K ^ 5+^2 ^ 2 )^ 


( 6 ) 


The numerator and denominator of the second member are dis¬ 
tributed independently like with 1 and d.f. respectively. Hence, 
by the Theorem of § 85, the statistic t conforms to the t distribution 
for V D.F. It is the quotient of the normal deviate —Xg I'ii® 
estimate of its s.e. derived from the samples. If the value of t 
obtained from the samples is significant, our hypothesis is dis¬ 
credited. 

We might vary the above argument to test the hypothesis that 
the samples were drawn from diii’erent normal populations, vsith 



192 Tests of Significance [x 

means fi and /*' respectively, but the same variance. In this case 
Xi^ii and X 2 - ii* are normally distributed about zero with variances 
and respectively. Hence their difference 

(^1-^2)’-(A-/) 

is normally distributed about zero with variance 0 * 2 ( 1/^1 + 1 / 712 ). 
This difference takes the place of ^ 1—^2 hi the above argument, so 
that 

Sy/(l/ni+1/712) ' 

oonforma to the t distribution for p d.f. 


Example 1. Show that the 95 % fiducial limits for the difference of the 
means of the populations are —^2 i ^i«>/(l/wi + lMa)» where is the value 
of t corresponding to f* = 0*05 and v d.f. If zero lies outside the above range 
of/* — /*', we conclude that the difference between the means of the samples is 
significant of a difference between the means of the populations. 

Example 2. The heights of the ten men of a random sample ‘from an 
unknown population gave a mean of 69 in., and a sum of squares of deviations 
from the mean equal to 42 in.^ Apply the t test to the hypothesis that this 
sample is from the same population as that of the example in § 86. 

From the data we have 

»i=68, X2=:69, ni6| = 36, n,<S| = 42, ni = 9, n 2 = 10. 

The estimate s* of the population variance is 

«* = (36 + 42)/17 = 4-69, 

BO that 9 = 2*14. The value of i from the sample is then 



For F = 17 this value is not at aU significant. The test therefore provides no 
evidence against the hypothesis. 


89. Significance of an observed correlation 
The distribution and table of t may also be used to test the 
significance of a value r of the correlation coefficient, given by a 
random sample of n pairs of values from a bivariate normal popula¬ 
tion. The s.E. of r may not be used in the case of small samples, 
since the distribution of r is then far from normal. We have seen 
that, if the variables in the normal population are uncorrelated, 
the value of r* in random samples of n pairs is a 



89] Significance of Observed r 198 

variate. Therefore, by Theorem VII of §72, r*/(l-r*) is a 
fiiih i(* ~ 2)) variate. Thus, if t is defined by 


* = ( 8 ) 

then —2) is a variate, so that t conforms to the 

t distribution for to — 2 d.f. 

To test the significance of the sample value of r, we make the 
assumption that the variables in the population are uncorrelated. 
A simple calculation then gives the value of t corresponding to the 
sample; and the table of t then tells whether the value obtained is a 
rare one. If it is, our assumption is discredited, and we conclude 
that the variables in the population are probably correlated. 


Example 1. Is a correlation coefficient of 0*6 significant, if obtained 
from a random sample of 11 pairs of values from a normal population? 

Here r=J, i^ = 9, t = 3-r ^(3/4) = .^3 = 1*73. From the table we find 

that the probability of obtaining a value of t larger than this is greater than 
0*10. Hence there is no reason to suspect the hypothesis of uucorrelated 
variables in the population. The value 0*6 is not significant. 

Example 2. Find the least value of r, in a sample of 27 pairs f rom a normal 
population, that is significant at the 6 % level. ^ 

Here p = 25 and, at the 6 % level of significance, t = 2*06. Hence for r to 
be significant we must have 


which requires r2>0-146 and therefore |r|>0-38. Values of r numerically 
less than 0-38 are not significant at the 6 % level. 


6r 


V(l-r2) 


>2-06, 


The accompanying short table gives the least values of [ r | that 


Table 6. Minimum values of r that are significant at the 6 % level 


p 

r 

p 

r 

p 

r 

V 

r 

4 

0-811 

11 

0-653 

18 

0-444 

46 

0-288 

6 

0-766 

12 

0-632 

19 

0-433 

60 

0-273 

6 

0-707 

13 

0-614 

20 

0-423 

60 

0-250 

7 

0-666 

14 

0-497 

26 - 

mmum 

70 

0-232 

8 


16 

0-482 

30 

0-349 

80 

0-217 

0 

0-602 

16 

0-468 

36 

0-326 

90 

0-206 



17 

0-456 


0-304 

100 



Reproduced by permission of the author. Professor R. A. Fisher, from his book 
on Statistical Methods for Research Worker^, » 


WHS 


S3 










194 


Tests of Significance [x 

are significant at the 5 % level, for samples of different sizes from a 
normal population. These values may be calculated as in Ex. 2 
above. The number v of degrees of freedom is n — 2, where n is the 
number of pairs of values in the sample. 


90. Significance of an observed regression coefficient 
The significance of an observed value, 6, of the linear regression 
coefficient of y on x, in a random sample from a normal population, 
may also be tested by the t distribution and table. It was shown in 
§ 83 (Theorem V) that, for random samples of n pairs from an un¬ 
correlated normal population, 6Vf/cr| is a 1)) variate. 

Consequently the statistic t defined by 

t = b(TiAj(n-'l)/G‘29 (9) 

conforms to the t distribution for n — 1 d.f. But, since cTi and cTg 
are usually unknown, this relation is not of much use in providing 
a test of significance for 6. However, a similar result is free from this 
objection. For, from the relation 

b = TS 2 IS 1 , (10) 


in which 5? and #81 are the variances of x and y in the sample, it 
foUowsthat 6* S (*<-*)* nr*5|/(r| 


where as usual denotes the estimate of y^ found from the regression 
equation. Now, by §82, the numerator and denominator of the 
second member are distributed like with 1 and n — 2 d.f. respec¬ 
tively. The quotient is therefore a — variate, and the 

statistic t defined by 

conforms to the t distribution for n-2 d.f., and may be used to 
test the significance of the value of b found from the sample. More 
generally Fisher* has shown that, if the variables in the population 
are correlated, with as coefficient of regression of y on x, the 
statistic which conforms to the t distribution is obtained from (11) 
on replacing b by (b-~fi). 


♦ Fisher, 1922, 2, p. 009. 



195 


90, 91] Range of a Sample 


91. Distribution of the range of a sample. 


The range, w, of a sample of values is the difference between 
the highest and lowest values. This concept is widely used in 
the statistical control of quality in mass production by an in¬ 
dustrial plant. Though the range does not conform to the ^-distri- 
bution, it is convenient here to consider the sampling distribution 
of w for samples of n values from a continuous population, whose 
relative frequency density is/(a:) in the interval (a, 6). Consider 
the infinitesimal intervals (u, u + du) and (v, v + dv) within the 
range of variation of x, and such that u<v. Then the probability 
that, at any drawing, the value chosen will lie in the first of these 
hf{u)du, and in the second/(t;)di;. The probability that it will 
lie in the intervening interval is 

p=r f{x)dx. (i) 

J u+du 

Consequently the probability that, in the drawing of n values, 
one will lie in the interval du, one in the interval dv, and the 
remaining n — 2 in the intervening interval is 

dP = n(n — 1)/ {u)f{v) dudv, (ii) 


since n{n— 1) is the number of ways in which one of the n values 
may fall in each of the intervals du and dv. In other words, dP 
is the probability that the lowest value in the sample will fall in 
the interval du, and highest in the interval dv. 

We may express this in terms of u and w instead of u and v. 
Since w^v — u the Jacobian of u, w with respect to u, v is unity, 
and the probability that the lowest value will fall in the interval 
dUy and w in the interval dw, is therefore 

a u+w \n-2 

f{x)dxj dudw (iii) 

to within infinitesimals of this order. Summing for all intervals 
du consistent with a range tv, we find the probability that w wiD 
fall in the interval dtv, irrespective of the value of u, as 


ip’'/(“)/(«+ dujdw. (12) 


il-* 



196 Tests of Significance [x 

This is the required probability diflFerential of w. Denoting the 
coefficient of dw by <l>[w) we have the expected value of w as 



w<^{w) dw. 


This expected value has been calculated by numerical integration 
for the case of a normal population of s.d. o', and for various 
values of n. Among the results found are 


n: 2 3 4 5 6 10 50 

E{w)l(n 1-13 1-69 2-06 2-33 2*53 3-08 4-50 


Example 1. If the variable in the population has a uniform distribu¬ 
tion from 0 to 6, then /(») = 1/6. Show that the probability differential 
of the range is w(n — 1) {b—w)dw, and that the expected value of 

ti7 is (n-1) 6/(n*f 1). 


Example 2. Show that, in taking a random sample of n numbers 
between zero and unity, the probability that the range will exc^d 0*5 
is 1 — (n+1)/2**. Show also that 8 is the least value of n for which this 
probability exceeds 0*96. 


Distribution of the Variance Ratio 

92. Ratio of Independent estimates of the population variance 

As in § 88, let us consider two independent random samples whose 
values are (<=1, and 0 = 1, ...,^ 2 ), with means z^ 

and Z 2 respectively. These provide estimates 5? and si of the variances 
of the populations, given by 

^ %-! •'i ’ • * n,-l J'a 

corresponding to and n-®*- respectively, where 

We wish to consider whether two such estimates are significantly 
different, or whether the samples may be regarded as drawn from 
the same normal population of variance cr^. 



92] Distrihution of F 197 

An appropriate test is furnished by the sampling distribution of 
the ratio F of two such estimates of or^ obtained from independent 
samples from the same normal population. Thus if 


then 


F = 

si 

v-^F 

P2 rig ^1/0*2* 


(13) 


Now the numerator and denominator of the second member are 
distributed independently like with and Pg respectively. 
Hence the quotient is a variate, so that V 1 FIV 2 con¬ 

forms to the distribution of §72, with I = and m = |i^ 2 - Con¬ 
sequently the probability that the value of F will faU in the interval 


dF is 


dP = 




(14) 


This is the required distribution of the variance ratio for Vi and ^2 n.F. 
It will be observed that the distribution is independent of the 
variance o'* of the population. 



Since v^FJv^ is a variate its modal value, by § 72 (28), 

is l)/(i^ 2 +1)* Consequently the modal value of F is given by 



198 


Tests of Significance [x 

and this is always less than unity. Similarly, the mean value of 
v^FIv^j by §72(29), is and therefore the expected 

value of F is 

which is independent of and is always greater than unity. The 
probability curve for F depends, of course, on both and but 
its main features, for > 4, are those of the curve in Fig. 9. 


93. Fisher’s z distribution. Table of F 


A distribution equivalent to (14) was first obtained by R. A. Fisher. 
Writing z = J log ^ F and therefore F = in the above result, we 
deduce immediately that 




F{\Vxy \v^) * 


(15) 


This is Fisher’s z distribution. The probability that a specified value 
of z will be exceeded in random sampling depends upon and Vg. 
Fisher published tables* giving the values of z that will be exceeded 
with probabilities 0-05 and 0-01 respectively, corresponding to 
specified values of and From these G. W. Snedecor prepared 
a tablet ^^r the variance ratio, which he denoted by F in honour of 
Fisher. Extracts from this table are here printed by permission of 
Snedecor and the Iowa Press. The ratio, F, tabulated is that of the 
larger estimates of variance to the smaller. The number of degrees 
of freedom corresponding to the larger estimate determines the 
column in the table, while V 2 determines the row. At the inter¬ 
section of the row and the column are given two values of F. The 
upper is the value that will be exceeded with a probability 0‘05, 
and the lower with a probability 0*01. These are often referred to 
as the 6 and 1 % ‘points’ of F. The latter is, of course, always the 
larger. The hypothesis to be tested is that the samples are from the 
same normal population, or from normal populations of equal 
variance. A value of F less than the 6 % point is not significant. 
A value between the 5 and 1 % points is significant at the former 


• Staiiatical Methods for Research Workers, 1926, 
t Snedecor, 1934, 2 or 1938, 3, pp. 184-7. 



98 ] Table of F 199 


Table 6. The variance ratio, 6 and 1 % * points* of F 

Vi is the number of degrees of freedom for the greater 
estimate of variance, and for the smaller 


X 

1 

2 

3 

4 

5 

6 

8 

12 

24 

00 

2 

18-61 

19-00 

19-16 

19-26 

19-30 

19-33 

19-37 

19-41 

19-45 

19-50 


98-49 

99-00 

99-17 

99-26 

99-30 

99-33 

99-36 

99-42 

99-46 

99-50 

3 

10-13 

9-55 

9-28 

9-12 

9-01 

8-94 

8-84 

8-74 

8-64 

8-63 


34-12 

30-82 

29-46 

28-71 

28-24 

27-91 

27-49 

27-05 

26-60 

26-12 

4 

7-71 

6-94 

6-69 

6-39 

6-26 

6-16 

6-04 

5-91 

5-77 

5-63 


21-20 

1800 

16-69 

16-98 

15-62 

16-21 

14-80 

14-37 

13-93 

13-46 

6 

6-61 

6-79 

6-41 

6-19 

6-06 

4-95 

4-82 

4-68 

4-63 

4-36 


16-26 

13-27 

1206 

11-39 

10-97 

10-67 

10-27 

9-89 

9-47 

9-02 

6 

6-99 

6-14 

4-76 

4-53 

4-39 

4-28 

4-15 

4-00 

3-84 

3-67 


13-74 

10-92 

9-78 

9-15 

8-76 

8-47 

8-10 

7-72 

7-31 

6-88 

7 

6-59 

4-74 

4-35 

4-12 

3-97 

3-87 

3-73 

3-57 

3-41 

3-23 


12-25 

9-65 

8-45 

7-85 

7-46 

7-19 

6-84 

6-47 

6-07 

5-65 

8 

• 5-32 

4-46 

4-07 

3-84 

3-69 

3-58 

3-44 

3-28 

3-12 

2-93 


11-26 

8-65 

7-69 

7-01 

6-63 

6-37 

6-03 

6-67 

6-28 

4-86 

9 

5-12 

4-26 

3-86 

3-63 

3-48 

3-37 

3-23 

3-07 

2-90 

2-71 


10-66 

8-02 

6-99 

6-42 

6-06 

5-80 

6-47 

5-11 

4-73 

4-31 

10 

4-96 

4-10 

3-71 

3-48 

3 33 

3-22 

3-07 

2-91 

2-74 

2-54 


10-04 

7-66 

666 

5-99 

5-64 

6-39 

5-06 

4-71 

433 

3-91 

12 

4-76 

3-88 

3-49 

3-26 

3-11 

3-00 

2-85 

2-69 

2-60 

2-30 


9 33 j 

6-93 

6-96 

6-41 

6-06 

4-82 

4-60 

4-16 

3-78 

3-36 

14 

4-60 ’ 

* 3-74 

3 34 

3-11 

2-96 

2-85 

2-70 

2-53 

2-36 

2-13 


8-86 

i 6-61 

5-66 

6-03 

4-69 

4-46 

4-14 

3-80 

343 

3-00 

16 

4-49 

3-63 

3-24 

3-01 

2-85 

2-74 

2-69 

2-42 

2-24 

2-01 


8 63 

6-23 

6-29 

4-77 

4-44; 

4-20 

3-89 

3-66 

3-18 

2-76 

18 

4-41 

3-56 

316 

2-93 

2-77 

2-66 

2-61 

2-34 

2-15 

1-92 


8-28 

6-01 

609 

4-58 

4-25 

4-01 

3-71 

3-37 

3-01 

2-67 

20 

4-35 

3-49 

3-10 

2-87 

2-71 

2-60 

2-45 

2-28 

2-08 

1-84 


8-10 

5-86 

4-94 

4-43 

4-10 

3-87 

3-66 

3-23 

2-86 

2-42 

25 

4-24 

3 38 

2-99 

2-76 

2-60 

2-49 

2-34 

2-16 

1-96 

1-71 


7-77 

5-67 

4-68 

4-18 

3-86 

3-63 

3-32 

2-99 

2-62 

2-17 

30 

4-17 

3-32 

2-92 

2-69 

2-63 

2-42 

2-27 

2-09 

1-89 

1-62 


7-66 

5-39 

4-61 

4-02 

3-70 

3-47 

3-17 

2-84 

2-47 

2-01 

40 

4-08 

3-23 

2-84 

2-61 

2-45 

2-34 

2-18 

2-00 

1-79 

1-51 


7-31 

5-18 

4-31 

3-83 

3-51 

3-29 

2-99 

2-66 

2-29 

1-81 

60 

4-00 

3-15 

2-76 

2-62 

2-37 

2-25 

2-10 

1-92 

1-70 

1-39 


7-08 

4-98 

4-13 

3-66 

3-34 

3-12 

2-82 

2-50 

1 2-12 

1-60 

80 

1 3-96 

3-11 

2-72 

2-49 

2-33 

2-21 

2-06 

1 1-88 

1-65 

1-32 


1 6-96 

4-88 

4-04 

3-66 

3-25 

3-04 

2-74 

2-41 

1 2-03 

1-49 


Extracted from G. W. Snedecor’s Statistical Methods, pp. 184-7, by courtesy 
of the author and the Iowa Collegiate Press. 




200 


Tests of Significance [x 


level, but not at the latter. A value greater than the 1 % point is 
regarded as highly significant. A significant value of F throws doubt 
on the truth of the hypothesis. This test will be used extensively in 
the next chapter. It is illustrated by the following numerical 
example. 


Example* Apply the test to the samples of heights given in the examples 
of §§ 86 and 88 (Ex. 2). 

The two estimates of the population variance furnished by the samples 
are 36/8 and 42/9, corresponding to 8 and 9 d.f. respectively. The second is 
the larger, so that 


This value of F is much below the 5 % point as indicated by the table. It is 
therefore not at all significant, so that the samples may very well be regarded 
as drawn from the same population. 


Fisher’s Transformation of the 
Correlation Coefficient 


94. Distribution of r. Fisher^s transformation 
The distribution of the correlation coefficient r, in random 
samples of n pairs of values from a bivariate normal population in 
which the correlation is p, was given by Fisher* in 1916. He showed 
that the probability that the correlation coefficient will have a 
value in the interval dr is 




— 3 )! 


d{rp) 


■* /arocos(-rp)\ - 


This distribution is far from normal, with a probability curve which 
is very skew in the neighbourhood of /> = ± 1, even for large samples. 
The use of the s.b. of r is therefore not to be recommended. In a 
subsequent paperf Fisher showed that the transformation 


* = (16) 

defines a variate «, whose distribution is approximately normal with 
mean f and variance l/(n — 3), tending rapidly to normality as the 
size of the sample increases. Thus the s.e. of z is independent of the 


• * Fisher, 1915, 2. 


t Fisher, 1921, 1. 



94] Fisher*8 Transformation of r 201 

value of r. For the proofs of these properties we must refer the 
reader to Fisher’s papers; but the distribution of r in the case of an 
uncorrelated normal population was considered in §82, and the 
corresponding distribution of z will be discussed in Ex. 1 below. 

By means of the statistic z we may test whether an observed 
correlation coefficient differs significantly from some theoretical 
value, or from some value given in advance; or whether the values 
of r obtained from two samples differ significantly. From the 
values of r and p we determine those of z and ^ by (16); and it is 
then easy to decide whether the deviation z ^ is significant for a 
normal distribution of variance l/{ri—3). To obviate the necessity 
of calculating z in every case, Fisher published a table setting out 
the values of r, which correspond to specified values of z ranging 
from 0 to 3 at intervals of 0*01. Extracts from this table are printed 
herewith. 

• Table 7. Fisher's transformation of r 

Values of r for specified values of z at intervals of 0*02 


2 

000 

0*02 

0*04 

0*06 

0*08 

00 

0000 

0*020 

0*040 

0*060 

0*080 

01 

0100 

0*119 

0*139 

0*159 

0*178 

0-2 

0197 

0*217 

0*236 

0*254 

0*273 

0-3 

0*291 

0*310 

0*328 

0*346 

0*363 

0-4 

0*380 

0*397 

0*414 

0*430 

0*446 

0*5 

0*462 

0*4^8 

0*493 

0-508 

0*623 

06 

0*637 

0*651 

0*665 

0*578 

0*692 

0-7, 

0*604 

0*617 

0*629 

0 641 

0*653 

0-8 

0*664 

0*676 

0*686 

0*696 

0*706 

0-9 

0*716 

0*726 

0*735 

0*744 

0*753 

10 

0*762 

0*770 

0*778 

0*786 

0*793 

M 

0*801 

0*808 

0*814 

0*821 

0*828 

1*2 

0*834 

0*840 

0*846 

0*851 

0*857 

1-3 

0*862 

0*867 

0*872 

0*876 

0*881 

1*4 

0*885 

0*890 

0*894 

0*898 

0*902 

1-6 

0*905 

0*909 

0*912 

0*916 

0*919 

1-6 

0*922 

0*925 

0*928 

0*930 

0*933 

1-7 

0*936 

0*938 

0*940 

0*943 

0*946 

1-8 

0*947 

0*949 

0*961 

0*953 

0*956 

1-9 1 

0-966 

0*958 

0*960 

0*961 

0*963 


Reproduced by permission of the author. Professor R. A. Fisher, from his book 
on Statistical Methods for Research Workers* 


For the case C = 0, that is to say for testing whether an observed 
value of r indicates any correlation in the population, the method 
of § 89 is preferable. » 





202 Tests of Significance [x 

Example 1. For samples from an uncorrelated normal population, we 
know that the distribution of r is 

^ _ (l-r*)«»-«dr 

5(i,i(n-2))‘ 

Fisher’s transformation (16) may be expressed 


so that 


f = tanhs, 
dr = seoh^zds, 


and the distribution of z is therefore 


dp = sech"-* 2 d 2 /B(J, J(n — 2)). 


Now sech 2 is approximately equal to exp (— Jz*), so that 
dp oc exp (— J(n — 2) z*) dz 


approximately. Consequently the distribution of z is approximately normal, 
with variance l/(n — 2). As Fisher h£is shown, however, a better approxima¬ 
tion to the variance of 2 is 1 /(n — 3). 


Example 2. In a random sample of 28 pairs of values from a bivariate 
normal population, the correlation was found to be 0*7. Is this val\ie con¬ 
sistent with the assumption that the correlation in the population is 0*6? 

Here r = 0*7, p = 0*6 and n = 28. From the table we find z = 0*87 and 
S= 0-65. so that 5 = 0-32. 


The s.B. of z is 1/V(25) = so that 

(z-0/(s.B.) = 1*6. 


Since z — ^ is considerably less than twice the s.b., its value is not significant. 
So far as this test goes, the correlation in the population might very well 
be 0*5. 

The 95 % fiducial limits for p are found in the usual manner (cf. §51). 
The value of £ must be such that 


|z-£|<1-96{s.b.) = 0-392. 


Consequently 0*87 — 0’392 < £< 0-87 -f 0«392 

0-48 <.£<1-26, 

and therefore from the table 

0-446<p<0-851. 

The 95 % fiducial limits for p are therefore 0-45 and 0*85 approximately. 


95. Comparison of correlations in independent samples 
Next suppose that two independent samples of and rtg pairs 
give correlation coefficients of rj and rg respectively. May they be 
regarded as drawn from the same population; or is the difference 



208 


95,96] Comparison of Correlations 

between and significant? On the assumption that the samples 
are from the same normal population, the difference between the 
values of z for the two samples is normally distributed with s.e. 

1 1 
Wi — 3 Wg — 3 

The two values of z are 

2l = il0gT-^. Z2 = il0g— 

If 1«! — ^ 2 1 is less than 2e, the difference is not significant at the 
6 % level; and the assumption that the samples are from the same 
population, or from equally correlated normal populations, is not 
discredited. 



*<^Example, The first of two samples consists of 23 pairs, and gives a correla¬ 
tion of 0*6; while the second, of 28 pairs, has a correlation of 0*8. Are these 
values significantly different ? 

On Ihe hypothesis that they are from the same normal population, the 
S.E. of the difference of the z’s is 


« = ■Jiih+is) = VO-09 = 0-3. 


From the table we find that Zi = 0*66 and Za = I’lO, so that 

I Zi — Za I = = 1 -836, 




which is a little less than 26, and is therefore not quite significant at the 6 % 
level. The hypothesis is not discredited. 

If in the above example the value of is not given, we may find the limits 
between which it must lie in order that | Tj — rg | should not bo significant at 
the 6 % level. For the condition to be satisfied is 


|zi-Za|<l-96€ = 0-688. 

Consequently 1*10 —0-688 <Zi< 1*10+ 0*688 

0-612<Zi< 1-688, 
so that 0-47 <ri < 0-93. 


96. Combination of estimates of a correlation coefficient* 
Suppose that k samples of pairs of values (t = 1, ...,^) yield 
correlation coefficients We may wish to enquire whether the 
samples may be regarded as drawn from the same normal population 


• Cf. Yates, 1934, 6. 



204 


Tests of Significance [x 

(or equaDy correlated ones); and, if they may bo so regarded, to 
obtain a combined estimate of the population value p. To test the 
homogeneity of the estimates r^, we make the assumption that the 
samples are from equaUy correlated populations. Then, by means 
of Fisher’s transformation (16), we obtain values of variates 
which are approximately normally distributed about a common 
mean, with variances l/(n< —3). The estimate of their common 
mean, f, which has minimum variance, is obtained by weighting 
the values inversely* as their variances. This estimate z is therefore 


“ S(ri<-3) • 


(17) 


Then, since the variates z^ are approximately normally distributed 
about with variances l/(n^ —3), the sum 2 (^<“ 3) (z^ — 2 )^ is dis¬ 
tributed approximately as with d.f., the mean z having 
been determined from the data.f The significance of the calculated 
value of this quantity may be ascertained from the table of x*- 
We may express the above sum in a form more convenient for 
numerical calculation. Thus 


S {n, ~ 3) {z, - z)« = S (n, - 3) z? - z* S (n, - 3) 

« 2 (rii ~ 3) zf - [S (n, - 3) zjVS {n, - 3). 

If the calculated value of this expression is not significant as a 
value of with A; — 1 d.f., the estimates of the correlation in the 
population may be regarded as homogeneous. In that case the 
value of z given by (17) is an estimate of the true value, f, corre¬ 
sponding to the population coefficient p. The required estimate of 
p is then given by 

p — tanhz, 

and its value may be read off from the table. 

Example. Independent samples of 21, 30, 39, 26 and 36 pairs of values 
yielded correlation coefficients 0*39, 0-61, 0*43, 0*64 and 0*48 respectively. 
May these estimates be regarded as homogeneous? If so, find an estimate of 
the correlation in the population. 

• Cf. Ex. IV, 6. 

^ t detculs of this part of the proof see Ex. X, 10 below. 



96] Examples 205 

The corresponding values of 2 ^ may be taken from the table, and the 
calculation tabulated as follows: 




n,-3 


(n,-3) 

0*39 

0*412 

18 

7*416 

3*055 

0*61 

0*709 

27 

19*143 

13-672 

0*43 

0*460 

36 

16*560 

7*618 

0*54 


23 

13-892 

8*391 

0*48 

0*624 

32 

16*768 

8*786 

Totals 

— 

136 

73*779 

41*422 


The value of from the data is therefore 

= 41*422 —(73*779)^/136 = 1*4 approximately. 

For 4 D.F. this value is not at all significant, so that the coefficients may be 
regarded as homogeneous. From (17) we have 

z = 73*779/136 = 0*6425. 

The table then gives the estimate of p for the population as 0*495 = 0*6 
nearly. , 

COLLATERAL READING 
Fisher, 1938, 2, chapters v, vi and vii. 

Fisher, 1915, 2; 1921, 1; 1922, 2; 1924, 2; 1925, 1 and 1936, 1. 
Tippett, 1931, 2, chapter v. 

Yule and Kendall, 1937, 1, chapter xxiii. 

Rider, 1930, 1; 1939, 6, chapters vi and viii. 

Kenney, 1939, 3, part ii, chapter vii and pp. 172-86. 

‘Student*, 1908, 1 and 2. 

Smith, 1939, 7. 

Aitken, 1939, 1, chapter vii. 

Mills, 1938, 1, chapter xviii, pp. 698-618. 

Rietz, 1938, 6. 

Sawkins, 1940, 3; 1941, 1, 

Pitman, 1937, 3. 

Kendall, 1943, 2, chapter x and pp. 336-47. 


EXAMPLES X 

—-1. A random sample of 16 values from a normal population 
showed a mean of 41*5 in., and a sum of squares of deviations from 
this mean equal to 136 in.* Show that the assumption of a mean 
of 43*6 in. for the population is not reasonable, and that the 95 % 
fiducial limits for this mean are 39*9 and 43*1 in. 










206 


Tests of Significance [x 

Another sample of 20 values from an unknown population has a 
mean of 43*0 in., and a sum of squares of deviations from this mean 
equal to 171 in.* Show that the two samples may be regarded as 
from the same normal population. 

" 2. Nine patients, to whom a certain drink was administered, 
registered the following increments in blood pressure: 7, 3, — 1, 4, 
— 3, 6, 6, — 4, 1. Show that the data do not indicate that the drink 
was responsible for these increments. 

On the null hypothesis the above values are regarded as a random 
sample from a normal population whose mean is zero. The data 
give X = 2 and 5* == 63/4. Hence t = 1*51 with = 8. This value is 
not significant. 

^ 3. In testing the superiority of Leake’s drill over the ordinary 
drill, plots in the form of long strips were cultivated, two adjacent 
strips being aUotted at random to Leake’s drill and the ordinary.* 
For ten such pairs of plots the values of the excess of the Weight of 
grain from the plot treated by Leake’s drill over that obtained by 
use of the ordinary drill were 2*4, 1*0, 0*7, 0*0, 1*1, 1*6, 1*1, -0*4, 
0*1 and 0*7. Show that the data furnish strong evidence of the 
superiority of Leake’s drill. 

From the data x = 0*83 and S (x^—x)^ = 6*001, so that 
5* = 6*001/9 = 0*666. 

On the null hypothesis the mean of the population is zero, and 
t = 3*22 for 9 d.f. This value belongs to about the 1 % level of 
significance. There is thus strong evidence against the null hypo¬ 
thesis. 

” 4. For a random sample of 10 pigs, fed on diet A, the increases in 

weight in a certain period were 

10, 6, 16, 17,13,12, 8, 14, 16, 91b. 

For another random sample of 12 pigs, fed on diet B, the increases 
in the same period were 

7,13, 22,16,12,14,18, 8, 21, 23,10,171b, 


* Data from Wishart, 1934, 3, p. 32. 



Examples 207 

Show that, by the test of § 88, the mean increases of 12 and 16 lb. 
in the two samples are not significantly different. 

[s^ = (120 + 314)/20 = 21*7; t = 1*5, v = 20.] 

5. Show that the estimates of the population variance from the 
samples in Ex. 4 are not significantly different. 

6. Deduce from §82(25), that the statistic r— — 

conforms to the t distribution forn — 2 d.f. 

A random sample of 18 pairs from a bivariate normal popula¬ 
tion showed a correlation coefficient of 0-3. Is this value significant 
of correlation in the population? Prove by the method of §89, 
Ex. 2, that the least value of r significant at the 6 % level is 

about 0*47. 

✓ 

8. A*random sample of 19 pairs from a bivariate normal popula¬ 
tion showed a correlation of 0*65. Prove that this is consistent with 
the assumption of a correlation of 0-40 in the population. Also 
show that the 96 % fiducial limits for p are 0*28 and 0'85 approxi¬ 
mately. 

A second sample, of 23 pairs, showed a correlation of 0*40. Prove 
by the method of § 95 that the two samples may be regarded as 
from equally correlated populations. 

9. Show that the mean value of the positive square root of a 
Beta variate of the second kind, with parameters I and m, is 

r(i+i)r{m^i)ir(i)r(m). 

Deduce that the mean value of | i | for d.f. is 

Also show that, for samples from an uncorrelated bivariate normal 
population in which the variances are a\ and crj, the mean value of • 
the modulus of the regression coefficient of y on a: is 

<r,T(i(n-2))/«r,r(i(n-l))Vw. 



208 Tests of Significance [x 

10. To prove that the sum 2 (^i - 3) («< - «)* of § 96 is distributed 

approximately as with fc — 1 d.f., we may proceed by the method 
of § 77, Denoting the sum by we have 

= SK-3)(z,-S)^-2(i-g)SK-3)(2;,-0 + (2-5)^SK-3) 

= 2(r^i-3)(z,-0^-a (i) 

where = S - 3) {z^ - C)/VS (^i ~ 3). 

Now introduce an orthogonal linear transformation 
= 2 U = 1,2,A;), 

where Xj = ( 2 ^ - f- 3), 

and the first of the ^’s is identical with above. Then, since the x*s 
are independently and normally distributed about zero with unit 
S.D., so are the ^’s; and in virtue of (i) 

2g?. 

<-l <-l i«2 

Consequently T* is distributed like with fc — 1 d.f. 

11 . Show that, for the t distribution with v d.f., the moment of 
order r (even) about the origin is, for r<v, 

(r-l)(r-3)...l.i/*»'/(i;-2) (v~4)... 

12. For samples of n from a population with the exponential 
distribution / {x) = e'~^, x^O, show that the range conforms to 

^(w) = (n-1) e-«'(l 

and hence that JSf(w )**l + J+J + i + ... + — 

w— 1 



CHAPTER XI 


ANALYSIS OF VARIANCE AND COVARIANCE 

Analysis of Vabianob 

97. Resolution of the *8um of squares* 

In the words of its author, R. A. Fisher, analysis of variance is 
the ‘separation of the variance ascribable to one group of causes 
from the variance ascribable to other groups*.* It is a procedure by 
which the variation embodied in the data of the sample may be 
resolved into component variations due to independent factors. 
Each of the components yields an estimate of the population vari¬ 
ance; and these estimates are tested for homogeneity by means of 
the F table. 

Consider a random sample of N values of a normally distributed 
variable x. It is frequently possible to arrange these in classes 
according to a certain factor or criterion. For instance, if the 
variable is the price of a certain commodity, the classes may 
correspond to different seasons or to different districts. Or, if the 
variable is the crop yield of a variety of cereal, the classes may 
correspond to different manurial treatments. Let denote the 
value of the jth member in the ith class. Thus the first subscript 
indicates the class, and the second the position in that class. Let 
be the number of members in the tth class, x^ the mean value for 
that class, and x thefl|Miral mean for the whole sample of N values. 


Then 

Nx = S S S (aJy-a;) = 0 

i i < i 

(1) 

and 

= S Xij, 2 (% - *<) = 0- 

(2) 


i i 


To resolve the sum of squares of the deviations of the N values x^^ 
from the general mean, we may do so first for the members of the 
tth class. Thus, in virtue of § 3 (10), 

i i 

♦ Fisher, 1938, 2. p. 216. 


Wlit 


*4 



210 Analysis of Variance [xi 

and on summing this result for all the classes we have the required 
resolution of the ‘sum of squares’ 

i 1 i i < 

This formula holds, of course, whether the population is normal 
or not. 


98. Homogeneous population. One criterion of classification 
Now suppose that the population, from which the random sample 
of N values was drawn, is homogeneous with respect to the factor of 
classification, that is to say, that the factor has no effect upon the 
value of the variate. Then if the population is divided into classes 
according to this factor, the different classes will have the same 
statistical properties. In particular, they will have the same mean 
[i and the same variance which are the mean and the variance of 
the population. Then, from the various sums in (3), we can obtain 
three unbiased estimates of cr^. For, by (7) and (8) of §54, the 
expected value of the first sum in sampling is (N ~ 1) cr^; so that this 
sum, divided by N — 1, gives an unbiased estimate of cr^ based on 
N—1 D.F. Similarly, the values in the ith class constitute a 
random sample whose mean is so that 

ElZ {% -«<)*] = {w< -1) O’*, 

i 

and therefore* 


= (N-h)a\ 

i i 

where A is the number of classes. Thus the second sum in (3), divided 
by N —hy gives an unbiased estimate of cr^ based on N — h d.f. 
And, since the expected values of the two members of (3) must be 
equal, that of the final sum* is (A — l)cr2; so that this sum, divided 
by A— 1, gives an unbiased estimate of cr* based on A-1 d.f. The 
identity 


(N-l)(ra = (N-A)(r2+(A-l)(r2 


(4) 


obtained by taking expected values of the various sums in (3), 
shows that degrees of freedom are additive, the number of freedoms 


* See also Ex. XI, 1, below. 



98] One Criterion 211 

corresponding to the total sum being equal to the sum of the 
freedoms corresponding to the partial sums. The above results are 
usually tabulated as follows: 


Source of variation 

D.F. 

Sum of squares 

Mean square 

Between class means 

h-1 

< 

< — 

Within classes 

N-h 


i 1 

Total 

JV-1 

1 1 

— 


In the columns headed ‘d.p.' and ‘Sum of squares* the items are 
additive, but not in the last column which gives the estimates of o*^. 

The argument so far holds whether the population is normal or 
not. In the case of a homogeneous normal population the results 
follow from the distributions of the various sums in (3). For then 
the first sum divided by is distributed like with N-l D.P., as 
proved in §77. The mean value of this sum is therefore {N — 
Similarly, 2 with 1 d.p.; and therefore the 

second sum in (3), divided by <r\ is distributed like x* with 

=:{N-h) D.p. 

i 

The mean value of this sum is therefore {N-h)or^, Similarly for 
the final sum in (3). 

In order to test the homogeneity of the estimates o f -e r ^ by means 
of the variance ratio and the F table, it is necessary to assume that | 
the population is normal; for this test is founded on that assumption. ‘ 
The practice is to compare the estimate ‘ between cja^s means ’ with 
that obtained ‘within classes’. For the final sum in (3) represents 
the variation due to the factor of classification, and the second sum 
in (3) is the residual variation aftec-the former has been removed. 
If the estimate obtained between classes is significantly greater 
than that within clasi^s, we are justified in concluding that the 
factor of classification exercises an influence on the value of the 
variable. In that case the assumption of homogeneity is discredited, 
and we must regard the population as heterogeneous. If, however, 
the estimates of o'® are not significantly diflferent, the test pi^vides 


u-a 




212 


Analysis of Variance [xi 

no evidence against the hypothesis of a homogeneous population. 
It is important to remember that the variance ratio may be tested 
by means of the F table only if the two estimates of variance are 
statistically independent. Since the mean of a random sample from 
a normal population is distributed independently of its variance, 
the two sums in the second member of (3) are independent, and the 
required condition is satisfied."^ 

99. Calculation of the sums of squares 
In calculating the above sums of squares it is not necessary to 
find the deviations from the various means. We know that the sum 
of the squares of the deviations of N numbers (« == 1,..., N), from 
their mean 5 is, in virtue of § 3 (11), 

a a a 

where T is the sum of the numbers. Applying this formula t6 the 
various sums considered above we have 

= ( 5 ) 

i i ii 

the grand total T being given by 

i i 

Similarly, summing first for the values in the ith class, we have 

22(%“^i)^ = 2(2^rf^-^r|/n,), 

i i i i 

where is the sum of the values in the tth class. Consequently 

22(%-^<)* = 22i«J?i~22^tK. (6) 

i i i i i 

Subtracting (6) from (5) we find for the third sum of squares 

(7) 

i i 

^ See also Fisher, 1925, 1 and Irwm, 1934, 4. 



99] Formulae for Sume 218 

This result may also be obtained directly. For, since is the 
frequency of the value 5^, 

E - xY = S - (E n,x^m = E T\ln, - 

i i i i 

as stated. 

Since the deviations from the means are independent of the choice 
of origin, the results obtained by using (5), (6) and (7) are unaltered 
by a change of origin. In other words, if all the values x^j are 
decreased (or increased) by the same constant, tlie values obtained 
for the three sums of squares are unchanged. The arithmetic may 
often be simplified in tliis way, large numbers being replaced by 
much smaller ones. 

Ex. 35 plots, of approximately equal fertility, were sown with 7 different 
varieties of wheat, 5 plots to each variety, the distribution of varieties among 
the plots being random. The following table gives the yields of grain in 
bushels* per acre, the 7 columns corresponding to the different varieties. Do 
the data (fictitious) indicate a significant difference in the yields of the 
varieties ? 


13 

15 14 

14 

17 

15 

16 


11 

11 10 

10 

15 

9 

12 


10 

13 12 

15 

14 

13 

13 


16 

18 13 

17 

19 

14 

16 


12 

12 11 

10 

12 

10 

11 


The classification is according to variety. The number of classes is A = 7, 
and the number of items in each clcuss is = 6. Consequently N = 35. The 
arithmetic is simplified by shifting the origin to (say) a; = 12. Diminishing 
all the yields by 12 we may rewrite the table: 

1 

3 2 

2 

5 

3 


4 

-1 

-1 -2 

-2 

3 

-3 


0 

-2 

1 0 

3 

2 

1 


1 

4 

6 1 

5 

7 

2 


3 

0 

0 -1 

-2 

0 

-2 

- 

1 

from which we have 

Ti = 2, 9, 0, 

6, 17, 1, 

► 7; T 

= 42; 




Xi = 0*4, 1*8, 0, 1-2, 

3-4, 0-2, 

» 1*4. 




11:4 = 266, T^IN = 60-4, 2 Tj/n, - 92. 
ii i 


Hence 



214 Analysis of Variance [xi 

The three sums of squares are therefore, by (6), (6) and (7), 

= 266-50*4 = 215*6, 

1 i 

2 S (Xtt - Xi)^ = 266 - 92 = 174, 
i i 

= 92-50*4= 41*6. 
i 

In tabular form: 


Source of 
variation 

D.r. 

Sum of 
squares 

- Mean 
square 

F 

Between varieties 

6 

41-6 

6-933 

11 

Within varieties 

28 

174 

6*214 

— 

Total 

34 

215*6 

— 

— 


For = 6 and v, = 28 the value 1*1 of F is not significant. Since the eJitimates 
of variance between varieties and within varieties are not significantly 
different, the experiment as a whole does not indicate significant variation 
in the yields of varieties. 

100. Two criteria of classification 

Consider next the case in which the N values of the data may 
be classified according to two different criteria, A and B, For 
simplicity suppose that A determines h different classes, and B 
determines k difierent groups; also that the hk values of the variable 
are such that, in each of the h classes there is one value from each 
group, and in each of the k groups one value from each class. For 
the purpose of calculation the hk values may be arranged in a 
rectangular array of h columns and k rows, the columns corre¬ 
sponding to classification A and the rows to B, The double suffix 
notation will indicate that belongs to the ith class and the jth 
group; and, in the rectangular array, this value occurs in the tth 
column and the jih row. As before x denotes the general mean; 
Xi is the mean of the values in the ith class, and x^ the mean of those 
in the jth group. 

The argument leading to (3) is valid here also, the resolution 
expressed by that equation being with respect to the means of 




100] Two Criteria 215 

columns. We may use (3) again to resolve the sum 2 S 

< i 

this time, however, with respect to the means of rows. In doing so 
wo take as a new variable 

each value of the original variable being diminished by the mean of 
the column in which it lies. The values may be arranged in h 
columns and k rows corresponding to those of Now the general 
mean of is zero, since the means of and x^ are each x. Thus 
-Z = 0. Similarly the mean of the values X^^ in the jth row is 

= Xj — X, 

Accordingly, on applying (3) to the quantities but taking row 
means instead of column means, we have 

. S 2 (X,, - X)* = 2 h(X, - X)^+2 2 (X« - X,)*, 

i i i i j 

or its equivalent 

2 2 (%-*<)* = 2-2)* + 2 2 (%+ x)\ (8) 

i i ) i i 

Substituting this value in (3), and remembering that the numbers 
are each equal to k, we have the resolution expressed by 

22(%-2)* = 2A:(^<-S)*+2A(®,-s)*+22i;?. (9) 

i j i i i j 

where — x^j—x^—Xj + x. 

As in the preceding section, the expected value of the first member 
of (9) is [hk— l)a^. Similarly, on the assumption of a homogeneous 
population, the expected values of the first two sums in the second 
member are (A — l)(r® and {k—l)cr^ respectively. That of the final 
sum is therefore {hh — h — k+l)(r'^. Thus the various sums in (9), 
divided by (M-1), (A-l), (^-1) and {h-l){k-l) respectively, 
give unbiased estimates of the population variance based on degrees 
of freedom represented by these divisors. When the population is 
normal the four sums in (9), divided by o'®, are distributed like x* 



216 Analysis of Variance [xi 

with degrees of freedom as stated above; and the mean values of 
the various sums follow from this. The above results are usually 
tabulated: 


Variation 

D.V. 

Sum of squares 

Mean square 

Between classes 

Between groups 

Error 

h-\ 

ifc-1 

'LHxt-x)* 

The quotient of * sum 
of squares* by D.r. 
in each case 

Total 



— 


Degrees of freedom are additive, as well as sums of squares. The 
variation corresponding to the factors of classification is repre¬ 
sented by the first two sums in the second member of (9). The un¬ 
controlled variation represented by the residual sum is 

i i 

due to a variety of causes, which are grouped under the term 
‘error*. 

To test the hypothesis of homogeneity in the population, we 
compare the two estimates of variance obtained between classes and 
between groups with that obtained from error. If the factor corre¬ 
sponding to either classification has a significant effect upon the 
value of the variable, this will appear in the corresponding mean 
square. In order to test the variance ratio by means of the F table* 
the population must be assumed normal. If either of the first two 
estimates of variance is significantly different from that obtained 
from error, the hypothesis of homogeneity is discredited. The 
significance of the difference of the means of any two classes, or 
any two groups, may be tested by means of the t table, as in the 
example below. 

Example, An agrioultural experiment was conducted to test the effects 
of change of soil (6 blocks) and variety of wheat (7 different strains) on the 
yield of grain. Each block was divided into seven plots, and the plots of each 
block were e^signed at random to the seven varieties. The yields, in bushels 
per acre, are set out in the same rectangular array as in the example of 
§ 99, columns corresponding to varieties and rows to blocks. Discuss the 
significance of the variation of yield with the two factors. 

* The independence of these estimates of variance may be established 
by Cochran’s me*lhod. See 1934, 6. 








101] The Latin Sqtiare 217 

With origin at x=i 12 the values of T, T^, the total sum of squares and 
the sum of squares corresponding to varieties are as already found. Show 
similarly that 

5r^ = 20, -6. 6, 28, -6 

X, = 2-86, -0-86, 0*86, 4, -.0*86 
S Till = 184-6, S Till - T*I35 = 134-2. 

The tabulation of results is: 


Source of 
variation 

D.F. 

Sum of 
squares 

Mean 

square 

F 

Varieties 

6 

41*6 

6*93 

4‘2** 

Blocks 

4 

134 2 

33*65 

202** 

Error 

24 

39 8 

1*66 

— 

Total 

34 

216*6 

— 

— 


The double asterisk indicates that the value of F is significant at the 1 % level 
(a single asterisk denoting significance at the 6 % level). Thus the yields of 
the varieties are significantly different; and the experiment indicates a very 
marked variation in soil fertility from one block to another. 

We may use the t test to examine more closely the difference between the 
mean yields of any two varieties, say the second and third. The difference is 

— = 1-8. To test the significance of this we use the estimate of variance 

obtained from ‘error’, and corresponding to 24 d.f. This value, 1-66, is the 
estimated variance of the yield of a single plot. The estimated variance of 
the mean of 6 plots is thus 1-66/6; and that of the difference of the means of 
two independent samples of 6 plots each is 1-66 x 2/6 = 0-664. The s.e. of 
the difference of the means of two varieties is therefore 0*816 nearly. The 
value of t for the above two varieties is thus 1-8/0-816 = 2-2. For 24 d.f. 
this is significant at the 6 % level. The least difference, m, that is significant 
is given by m/0*815 = 2*06, so that m = 1-68 nearly. Show also that the 
S.E. of the difference between the mean yields for two blocks is 0-69, and that 
a difference of 1-4 is significant at the same level. 

101. The Latin square. Three criteria of classification 

In the general case of three criteria of classification, corresponding 
to h,k,p classes respectively, we should require a three-dimensional 
generalization of the rectangular array employed above. In the 
particular case for which h — k = p this requirement is obviated by 
an arrangement known as the Latin square. As it is customary to 
use the letters A, .B, (7,... to distinguisj^ the different classes of one 
of the three classifications, we may conveniently explain the Latin 




218 Analysis of Variance [xi 

square, of order n, as an arrangement of the n letters A, B, C, ... 
in the form of a square array, such that each letter occurs once in 
each column and once in each row. Consequently each letter occurs 
n times in a Latin square of order n. The accompanying array is one 
arrangement for a Latin square of order five. 


B 

D 

E 

A 

G 

G 

A 

B 

E 

D 

D 

C 

A 

B 

E 

E 

B 

0 

D 

A 

A 

E 

D 

G 

B 


The triple classification and the Latin square arrangement are 
illustrated by the design of the following agricultural experiment. 
The variable is the yield of grain per acre, and the object'bf the 
experiment is to test the effect on yield due to change of manurial 
treatment, and to variation of soil in each of two perpendicular 
directions. A block of land is divided into plots, arranged in n 
parallel rows in one of the given directions and n parallel columns in 
the perpendicular direction. The n different treatments are dis¬ 
tributed at random among the n plots of each row, but in such a 
way that no two plots in the same column are given the same treat¬ 
ment. We thus have a Latin square in which letters correspond to 
treatments, while rows and columns correspond to soil variation in 
the given perpendicular directions. 

The necessary formulae for calculation are obtained by an easy 
extension of the foregoing results. With the same notation as in the 
preceding sections, the resolution expressed by (9) is still valid. The 

final sum S S Diay be separated into components by applying 
i i 

(3) to the variable and resolving, not with respect to the means 
of rows or columns, but with respect to the means of letters. Let 
Y denote the general mean of the values and 7/ the mean of the 
values for the lih letter. Then, since the mean of each of the four 
terms in Yu has the same n^agnitude x, we have Y ~ 0. To calculate 
Yi we consider the four terms of separately. The mean value of 



101] The Latin Square 219 

Xij corresponding to the Zth letter we shall denote by The mean 
of the values for the Zth letter is x, since each column contains the 
Zth letter once only. Similarly the mean of the values Xj for the Zth 
letter is x. The last term of is constant, and we have the result 

Yi = xi-x—x-^x = Xf—x. 

Thus, on applying (3) to the values Yu and the letters, we deduce 

S 2 {r«- F)* - n s (F,- F)*+2 S 

i i I i i 

or its equivalent 

< i I ii 

Substituting this value in (9), and putting h = k=^nf we obtain 
the required resolution 

E E w E n 2 w 2; - 5)2+2 E 

i i i i I ii 

( 10 ) 

where = Xij—Xi — x^ — Xi+ 2x. 


The expected value of the first member of (10) is (n*— l)cr^, and 
on the assumption of a homogeneous population, that of each of the 
first three sums on the right is (n—l)cr^, as proved in §98. Con¬ 
sequently 

^?(EE2?i) = [n2“l-3(n-l)]cr2 = (7i-l)(ri-2)(r2. 
i J 

Thus each of the first three sums in the second member of (10), 
divided by (n—1), gives an unbiased estimate of cr^ based on 
(n—1) D.F.; and the final sum, divided by (n—l)(n —2), gives an 
imbiased estimate based on that number of freedoms. In the case 
of a normal population the various sums in (10), divided by cr^, are 
distributed like with d.f. equal to those of the corresponding 
estimates of (r^. The results may be tabulated as shown below. 
The first three mean squares are compared with that obtained from 
error in the usual manner. The significance of the difference of the 
means of two classes may be tested by the i table as illustrated 
earlier. > 



220 Analysis of Variance [xi 

The various sums of squares may be calculated from the usual 
formulae, putting N and h Thus ^ 

< i i i 

Similarly, n S (^/ ■“ = S "" 

i i 

where Ti is the sum of the values for the Zth letter; and so on. By 
subtraction we have therefore 

2 S = 2 24 - (2 ^? + 2 + 2 Tf)ln + 2T^ln\ 

i j i j i j I 

It is usual to find this sum by subtraction after the other sums 
have been calculated. 


Source of 
variation 

O.F. 

Sum of 
squares 

Moan square 

Columns 

n-1 


1 

Rows 

n-1 

«2(®y—*)’ 

The quotient of ‘sum 
of squares’ by d.f. in 

Treatments 

n-1 

nS(8,-*)* 

each case 

Error 

(n-l)(n-2) 

SSZJy 


Total 

n* —1 

2 £(*«-*)• 

< ! 

— 


Example, An agricultural experiment was conducted on the Latin square 
plan to test the effect on yield due to change of treatment (6 kinds) and also 
to variation of soil in eaoh of two perpendicular directions. The results are 
Bet out in the Latin square below (n = 6), in which letters correspond to 
treatments, while rows and columns correspond to the two perpendicular 
directions. Are the effects on yield significant ? 


A 

7*4 

D 

8-9 

E 

6*8 

B 

120 

C 

14-3 

C 

11*8 

B 

6-6 

A 

. 8*7 

E 

7-6 

D 

7-9 

D 

10-1 

0 

17-9 

B 

90 

A 

8-6 

E 

7-1 

E 

8-8 

A 

101 

C 

15-7 

D 

IM 

B 

7-4 

B 

11-8 

E 

8*8 

D 

14-3 

0 

18-4 

A 

101 


Shifting the origin to a; = 10 (i.e. reducing each of the above yields by 10) 
the reader may easily verify that 


ii 

1 

o 

2-2, 

3-6, 

7-6, 

-3*2. 

!r, = -i-6, 

-7*6, 

2-6, 

CO 

13-4; T = 10. 

, T =-62. 

-3-3, 

28*1, 

2-3, 

-11-9. 





102 ] Correlation Ratio 

Consequently T^/N = 10 x 10/25 = 4‘00« 


The various sums of squares are given by 


S T^Jn- T^IN = 

i 

i 

I,T^ln-T^IN = 

l 

The tabulation is: 


286-18-4-00= 281-18 

(Toted) 

17-02-4-00= 13-02 

(Columns) 

60-96-4-00= 46-95 

(Rows) 

194-89-4-00 = 190-89 

(Treatments) 

Remainder = 30-32 

(Error) 


221 


Source of 
variation 

D.P. 

Sum of 
squares 

Moan 

square 

F 

Columns 

4 

13*02 

3*255 

1-3 

Rows 

4 

40*95 

11*737 

4*6* 

Treatments 

4 

190*89 

47*722 

18-9** 

Error 

12 

30*32 

2*527 

— 

Total 

24 

28M8 

— 

— 


For Vi =5 4 and I'l = 12 the 6 and 1 % values of F are 3*20 and 5*41. Thus 
the variation in rows is signihcant, cuid that due to treatments is higlily 
significant. 

Show that the s.e. of the difference of the means of two classes is 1-006, and 
hence that a difference of means less than 2-2 is not significant at the 6 % level. 
Thus the yield due to treatment C is significantly greater than that due to 
any other treatment; and the yield due to D is significantly greater than 
that due to E. 

102. Significance of an observed correlation ratio 

We shall next consider some applications of analysis of variancef 
to testing the significance of an observed correlation ratio, coefficient 
or index, and to testing the linearity of a regression. First suppose 
that a value, rj, of the correlation ratio of y on x, is obtained from a 
random sample of N pairs of values from a bivariate population 
in which y is normally distributed. We wish to test whether this 
value is significant of an association between the two variables in 
the population. Let the N pairs of values of the variables be 
arranged in arrays as in Chapter v, so that the y’s are classified 

t According to the definition given by Wifjhart and Sanders (see §106, 
below) these applications should be classified under * Analysis of Covarianpe*. 




222 


Analysis of Variance [xi 

according to the corresponding values of x. Then since the subscript 
i indicates the array, andj the position in that array, we have as in (3) 

S 2 {vu = {Vii - Vi? +2 - yf, (11) 

i i ii i 

y^ being the mean value of in the tth array, and the number of 
values in that array. In virtue of § 34 this equation corresponds to 
the identity ^ NS\{l-t)+NSit, (12) 

. 8\ being the variance of y in the sample. 

On the assumption that there is no association between the two 
variables in the population, the ^’s in each array may be regarded 
as a random sample from the population of i/’s. The various sums 
in (11), divided by the variance cr^ of y in the population, are then 
distributed like with (iV”-- 1), {N — h) and (A — 1) d.f., h being the 
number of arrays. The problem is therefore the same as in §98. 
On taking expected values of both members of (11) we have 

(jY~l)(r2= (i\r-A)o'2 + (A~l)(r2. ‘ (13) 

The various sums in (11), divided by the corresponding coefficients 
in (13), yield unbiased estimates of cr^. The tabulation is: 


Source of variation 

D.T. 

Sum of squares 

Mean square 

Between arrays 
Within arrays 

A-1 

N-h 

NSItj* 

Nm-V^) 

NS:ri*l{h-l) 

Total 

iV-1 

N3l 

— 


To test whether the mean square between arrays is significantl;^ 
greater than that within arrays we have 


with 


F 


t N-h 


= 1 , V2 = N’-h. 


Example. A random scunple of 79 pairs, arranged in 7 arrays of y% gave 
a correlation ratio of on a; equal to 0*4. Is this value significant ? 


Here 


F = 


Vi = 6 , 

0»16 72 


Vt = 72, 

= y = 2-2»*. 


Since the 5 % value of F is 2*23, the above value is significant at that level. 
We conclude that there is association between the variables in the population. 







108] Regression Function 228 

103. Significance of a regression function 

To test the significance of a regression function is to examine 
whether the data of the sample indicate any degree of association 
of the variables, of the type represented by the regression equation. 
We shall see that, in the case of a hnear regression equation, this 
amounts to testing the significance of an observed value of r; while, 
in the case of an equation of curvilinear regression, it is equivalent 
to testing the significance of an observed value of the correlation 
index R, 

Consider first a linear regression equation for a random sample 
of N pairs of values from a bivariate normal population. If is the 
estimate of given by the regression equation, we know by § 27 (31) 

i i i j i 

The first sum of squares in the second member is due to deviations 
from tile regression function, and the second sum to the regression 
function itself. In virtue of §27 the various sums in (14) are equal 
to the corresponding terms in the identity 

On the assumption of an uncorrelated normal poi)ulation these sums, 
divided by cr^, are distributed like with N —I, N — 2 and 1 d.f. 
respectively, as proved in § 82. The expected values of the various 
sums in (14) are therefore the corresponding terms of the identity 
(N —1)0*2 = (iV' —2) 0*2+0*2, 

and each sum thus gives an unbiased estimate of 0 * 2 , on division by 
the appropriate number of d.f. The tabulated results are: 


Source of variation 

D.F. 

Sum of squares 

Mean square 

Regression function 

1 


Nr^S\ . 

Deviation from the 
regression function 

iV~2 

*' 

NSi(l-r‘)l(N-2) 

Total 

iV-1 

* f 

— 


If the mean square due to the regression function is significantly 
greater than that due to deviations from this function, we conojude 




224 


Analysis of Variance [xi 

that there is a real association between the variables, of the type 
indicated by the regression equation. To test the significance we 
have for the variance ratio 

jP«(i\r-2)r2/(l-r*); = 1, v^ = N-2. 

The method is thus equivalent to testing the significance of r; and 
it will be noted that the above statistic F is the square of the statistic 
i of § 89 (8). The two tests are therefore equivalent. 

Example. A correlation of 0*6 is obtained from a random sample of 26 
pairs from a normal population. Is this value significant ? 

Hero f = i, = 26, F ^ 8**, =1, v, = 24. 

From the table we see that this value of F is highly significant of correlation 
in the population. 

Obtain the same result by use of the t table. 

In the case of a correlation index R associated with a' curved 
regression line, the formulae of §41 show that the above argument 
holds if r* is replaced by except that the numbers of d.f. must be 
altered. If A; is the number of statistics that must be calculated from 
the sample to obtain the regression equation, the number of d.f. 
associated with deviations from the regression function is N^k, 
and the number associated with the regression function itself is 
jk — 1, For testing the significance of the regression function we have 
therefore 

104. Test for non-linearity of regression 

Considering the random sample of N pairs of values from a bi¬ 
variate normal population, let us return to equation (11). In virtue 
of § 36 (18), the final sum in that equation may be further resolved as 

(is) 

i i i 

the various sums being equal to the corresponding terms of the 
identity 


Nv^Sl = N{v*^r*)Sl + Nr^Sl. 



104J Linearity of Regression 225 

The three sums in (16), divided by er®, are distributed like with 
A — 1, A —2 and 1 d.f. respectively; and on taking expected values 
of both members of (15) we obtain the identity 

(h—\)(T^ = (^— 2 ) 0 - 2 +( 7 *. 

The various sums in (15), divided by the corresponding numbers ol 
D.F., give unbiased estimates of o-^. That obtained from the first 
sum on the right is associated with deviations of the means of arrays 
from the line of regression of y on x. On the assumption of linearity 
of regression this sum is due to sampling errors; and the estimate 
of 0-2 obtained from it should not be significantly greater than that 
derived from the sum of squares within arrays, i.e. from 

SS(2/«-yi)*- 

i i 

Tabulating as usual we have: 


Source of variation 

D.P. 

Sum of squares 

Mean square 

Linear regression 

1 


Nr'^Sl 

Deviation of means 
from regression lino 

h-2 

i 


Within arrays 

N~h 

< 1 

1 

N(\-n^)SJ{N-h) 

Total 

N--1 

i i 



For testing whether the second mean square is significantly greater 
than the third, we have the variance ratio 




h-2’ 


h — 2 , v^ — N -rh. 


If the value of F is significant, the assumption of linearity of regres¬ 
sion is discredited. It should be observed that the value of F 
depends not only on the difference — but also on N and h. 
Thus a knowledge of — is by itself insufficient to decide the 
question of linearity of regression. 


WMS 


15 




226 Analysis of Covariance [xi 

Eacample, A random sample of 200 pairs of values from a bivariate normal 
population* when grouped in 10 arrays ofy*8, gave values r = 0*3 and = 0*4. 

Are these results consistent witli the cwsumption of linearity of regression? 
Here N = 200, h = 10, Vi = 8, p, = 190 and 


F = 


0-07 190 

0-848 


1-U8. 


This value is just on the 6 % level of significance. The assumption of linearity 
of regression is thus rather discredited. 


Analysis of Covabianob 

105. Resolution of the 'sum of products*. One criterion of 
classification 

Analysis of covariance has been described as ‘the technique of 
testing for homogeneity in problems dealing with two or more 
correlated variables*.* Its main use is to test the significance of 
the difference between the mean values of a variate y in certain 
classes, when these have been corrected for differences in some 
concomitant variable x. The method of doing this will be explained 
shortly; but we must first study the necessary algebraical tools. 

The resolution of the total variation into components, already 
studied in analysis of variance, has its counterpart in analysis of 
covariance; and the algebra of the two processes runs along parallel 
lines. The total covariation of a bivariate sample, represented by 
the sum of the products of the deviations of the variates from their 
means, may be resolved into components associated with different 
factors; and from these components, and the corresponding com¬ 
ponents of variation of x and of y, estimates of the coefficient of 
regression (or correlation) in the population are determined. The 
estimate from ‘error* is tested for significance as in §§ 103, 89 or 90; 
and, if this proves to be significant, the other estimates are tested 
by comparison with it. We are thus able to estimate the effects of 
the various factors on the degree of association of the variates. 

Suppose that the data consist of N pairs of corresponding values 

of the two variates, x and y, and that these may be grouped in h 

different classes according to a certain criterion. With the double 

< 

• * Wishart and Sanders, 1936, 3, p. 46. 



227 


106] One Criterion 

suffix notation, is the jth pair of values in the tth class 

(i = 1,A). The numbers of pairs in the different classes are not 
necessarily equal. Let n^ be the number of pairs in the ith class, so 
that S -A.S before let y denote the general means of the 

two variables, and x^, y^ the means of their values in the ith class. 
Then 

2 = 0 = 2 {Vu - Vi)- (16) 

i i 

The deviations of x^^ and y^^ from their general means are ex¬ 
pressible as 

+ {»<-»). 
yij-V = (yn-yi)+iyi-y)- 

On forming their product, and summing over all pairs of values, 
we have 

2 2 Vii - iVij - y) = 2 2 {^ii - ^i) iVij ” y<) + 2 ni(Xi - x) (yt - y), 
i i i i i 

(17) 

the remaining sums disappearing in virtue of (16). Corresponding 
to this we have also the resolution of the sum of squares of the a;*8, 
expressed by (3), and that of the y*8 given by a similar formula. 

Suppose now that the N pairs of values constitute a random 
sample from a homogeneous population, in wliich the covariance 
is fill. Then, by §61, the expected value of the sum in the first 
member of (17) is (iV — 1)/Aii. Similarly, since the values in the ith 
class may be regarded as a random sample of pairs, 

SO that 

^(22 ^i) ii/ij ~yi)) “ 2 (^i1) P'11 = /III. 

i i i 

Consequently, since the expectations of the two members of (17) 
must be equal, that of the final sum in the equation must be 
(^ — 1) /III. Thus the various sums in (17), divided by N — h and 

A — 1 respectively, give unbiased estimates of the covariance of the 
population, based on numbers of d.f. represented by these divisors. 



228 Analysis of Covariance [xi 

Following Wishart and Sanders* we may conveniently denote 
the three sums in (17) by C", C" and G respectively, the equation 
being then equivalent to C" = (7' + 0, The corresponding sums of 
squares for the x^b will be denoted A' and A , and those for the 

y's by B*", B' and B, Then (3) and its counterpart are equivalent to 

The resolution of sums of squares and products may be combined 
in a single table as follows: 


Source of 
variation 

D.F. 

Sums of 
squares 

Sum 

of 

pro¬ 

ducts 

Coefficient 

of 

regression 

Coefficient 

of 

correlation 

Between claseea 

Within classes 

/l-l 

N-h 

A,B 

A\ B' 

C 

C' 

b=xCIA 

6' = C'lA' 

r = 0/^{AB) 

Tot€d 

N--1 

A\B'' 


— 

— 


We thus obtain different estimates of the coefficients of regression 
and correlation. By means of §§ 103, 89 or 90 we first test whether 
the estimate obtained within classes is significant of correlation in 
the population. If it proves to be significant, we proceed to test the 
differences between the class means after these have been corrected 
for regression. Incidentally we may also test the significance of the 
difference between b and 6'. 

106. Calculation of the sums of products 

The sum of products of the deviations of N pairs of numbers 
Xg, Vs — 1» from their means x, y is given by 

'L{x,-x){y,-y)=‘Y.x,y^-xZy»-yi:.x,+Nzy 

« 8 8 8 

=‘I.x,y,-TT'IN. (18) 

a 

where ^ = S*. = -^2. T' = ^y, = Ny. 

Applying this formula to the sums in (17) we have 

0' = S S (*«-s) (y«-y) = S ^x^^y^^-TTlN, (i9) 

f 

, • Wishart and Sanders. 1936, 3, p. 48. 




106,107] One Criterion 229 

T, T* being the grand totals for x and y respectively. Similarly, on 
summing first for the values in the tth class, we have 


where 


C" = E 2 - Xi) - 2/i*) = S (S ^aVa ~ 

i j i i 

= S % = *<, Ti = S Vu = y<. 


these being the sums of the values of x and y in the ith class. Con¬ 


sequently 


0 ^ 

i j i 


Subtraction of (20) from (19) gives the remaining sum 


( 20 ) 

( 21 ) 


which may, of course, be obtained independently. 

Since the deviations from the means are independent of the choice 
of origin, the results obtained by using (19), (20) and (21) are 
unaltered by a change of the origins of x and y. In other words, if 
all the values x^^ are decreased (or increased) by the same constant, 
and all the values y^j by another constant, the results obtained for 
the three sums of products are unchanged. The arithmetic may often 
be simplified in this manner, large numbers being replaced by much 
smaller ones. 


107. Examination and elimination of the effect of regression 

The method of § 103, for testing the significance of an estimated 
regression, consists in resolving the total sum of squares of the y’s 
into two components, one due to regression and the other to devia¬ 
tions from the regression lino, and then comparing the estimates 
of population variance derived from these two sums. It will be 
observed that the test was applied to a sample without any attempt 
at classification, that is to say, quite apart from any resolution of 
the variation into components due to factors other than regression. 
In terms of the total sums of squares and products^ A"^ 0"^ 

the sum of squares due to regression is expressible as 

NS\r^ = ( 22 ) 

and that due to deviation from the regression line is B"^ —C'^^IA". 
The former corresponds to 1 d.f. and t|;ie latter to 17 — 2, which is 
one less than for the total sum of squares. « 



280 Analysis of Covariance [xi 

Now we have seen how, after the values in the sample have been 
classified, we may resolve the total variation of the variables, and 
their covariation, into components between the means of classes 
and within classes respectively. From each set of components we 
may calculate an estimate of the regression coefficient and a line 
of regression; and the corresponding variation of the y^s may be 
further resolved by the method of § 103 into a part due to regression, 
and another due to deviation from the regression line, the number 
of D.F. for the latter being one less than the total d.f. for that com¬ 
ponent. Applying this process to each line of the table at the end 
of § 105, we find from each a sum of squares of deviations from the 
corresponding line of regression, with d.f. as indicated in the 
following table:* 


Source of 
variation 

D.F. 

Kesidiial sum 
of squares 

Between classes 

Within classes 

h-2 

B-C^/A 

B'-C'‘IA' 

Total 

N^2 

B’-C^IA’ 


leading to different estimates of the variance of y in the population. 
Denoting the estimate within classes by we have 


* - N-h-l ' 


(23) 


This is the estimate of the variance of y after correction for regres¬ 
sion. The significance of any other estimate of the variance is tested 
by comparison with it. 

The significance of any apparent regression is first tested by the 
method of § 103, applied to the sums within classes; that is to say, 
we compare the estimate G'^IA\of variance, due to regression and 
based on 1 d.f., with the above estimate based on N —A — 1 d.f. 
If the regression proves to be significant we proceed to test the 
differences between class means after correction for regression. 
Subtracting the second row of the above table from the third we 
have the sum of squares with A — 1 d.f., which 

we also compare with the sum of squares within classes. If it proves 

, * Cf. Wishart and Sanders, 1936, 3, p. 49. 





281 


107] One Criterion 

to be significant we conclude that there are differences between the 
class means, after these have been adjusted by the regression 
coefficient 6' within classes. 

Now in the first line of the above table we have a sum of squares 
between class means, adjusted by the regression coefficient 6, and 
corresponding to (A — 2) d.p. The testing of this sum in place of the 
above raises the question of the significance of the difference between 
b and b\ This may be examined as follows. Tabulating the two sums 
of squares just mentioned, and their d.p., we may arrange the work: 


Between classes 

D.F. 

Sum of squares 

Adjusted 

^-1 

B + C'»IA'-G-'‘lA‘ 

Adjusted by b 

h-‘2 

B-C^IA 

Difference 

1 

C*IA + C'*IA'-C’*IA’ 


Now the sum in the last row may be expressed 

*02 + AA^ (0 C'Y AA'(b--by 

1'^ A' A+A' ~A+A'[a A'I ~ A+A' ’ 

Comparing this estimate of variance, based on 1 d.p., with the 
estimate based on 1, we have the variance ratio 


„ AA'(b^by N-h-^l 
~A-\-A' ^B'-G'^IA'’ 

Vi =1, — N-h-l, 

A rare value of F indicates that the two coefficients, b and 6', are 
significantly different; in which case the factor of classification has 
an effect upon the degree of association of the variables. 

Example. To examine the relation between the yield of grain (x bushels/ 
acre) and the cost of production {y shillings/bushel) in a certain state, six 
districts were chosen at random, and in each a random selection of five farms 
was made. The results for the season are tabulated below, columns corre¬ 
sponding to districts. Is there any significant indication of correlation 
between the variables; and, if so, does its value vary with the district? 


» y X y 

13 3*6 10 4*7 

11 40 9 6-3 

18 2-5 8 5-6 

14 3*7 12 40 

12 4*3 11 4-4 


X 

y 

X 

y 

16 

2-8 

9 

4-5 

13 

30 

12 

3-6 

15 

2*8 

10 

40 

10 

4-2 

13 

.3-4 

12 

3*2 

8 

6*5 


X 

y 

X 

y 

12 

3*8 

15 

4*7 

17 

2*2 

11 

5*1 

19 

2*0 

9 

60 

11 

4*6 

14 

4*3 

14 

3*4 

17 

?.9 


282 Analysis of Covariance [xi 

Shifting the origin to = 12, ^ = 3, and rewriting the table, the student 


trill easily verify that 




r, = 8. -10. 6, 

-8, 13, 

6; 

T= 16, 

t; = 3, 9, 1, 

6, 1, 

9; 

T' = 29, 

and hence that 




T^/N = 7-6, T'^/N = 28-03, 

TT'/N = 

14-6, 

2T<r;/6 = -8-2, 


224 = 269, 224=66-96, 22a;<^2/« =-64-8. 

i i i i i j 

Consequently 

A = 93-8 - 7-6 = 86-3, A' = 259 - 93-8 = 165-2, 

A^ = 259-7-5 = 251-5. 

B = 41-8-28-03 = 13-77, B' = 66-96-41-8 = 16-16, 
B-' = 66-9-28-03 = 28-93. 

O = - 8-2 -14-5 = - 22-7, O' = - 54-8 + 8-2 = - 46-6, 
0^ =-64-8-14-6 = -69-3. 

The tabulated results are: 


Source of 

D.r. 

Sum of squares 

Sums of 

Coefficient 

of 

regression 

variation 

(*•) 

(’/) 

products 

Between districts 
Within districts 

6 

24 

86-3 

165-2 

13-77 

16-16 

-22-7 

-46-6 

-0-263 

-0-282 

Total 

29 

251-5 

28-93 

-69-3 

-0-276 


First, to test the significance of b' we have, a>s explained above, 


F = 


C'^IA' 

B'-C'^jA' 


x(A^-A-l) 


13-14x23 

2 - 02 ““ 


= 


Vi = 1, V2 = 23. 


Thus the value of 6' is highly significant of negative correlation. 

To test the differences of class means, after these have been corrected for 
regression, we calculate the estimate of variance 


B+ 0'*M'- _ 13-774-13-14-19-09 

A-1 ““ 6 


and compare this with the estimate within classes 







Two Criteria 


283 


108 ] 

The variance ratio is 

Fz= 1 •66/0-088= 17-7** 

which, for = 6 and Va = 23, is highly significant. We conclude that the 
cost of production, corrected for differences in yield, varies significantly from 
district to district. 

By using (24) show that 6 and b' do not differ significantly. 

108. Two criteria of classification 

Suppose next that the N pairs of values in the sample 

may be classified according to two different criteria, H and K, the 
former determining h different classes and the latter h different 
groups. Suppose for simplicity that one pair of values from each 
group is present in each class, and vice versa, so that N = hk. The 
pairs of values may be arranged in a rectangular array of h columns 
and k rows, in which columns correspond to classification H and 
rows to K. The pair belongs to the ^th class and the jth 

group, and appears in the tth column and the jth row. 

The resolution of the sum of products expressed in (17) is still 
valid. But the first sum in the second member of the equation may 
be further resolved. Thus if 

and we apply the resolution expressed by (17) to the sum of pro¬ 
ducts S making it however with respect to the means of 

rows instead of the means of columns, we find by the same argument 
as in § 100 that 

S 2 (% “ iVij - = 2 (Vi “ + 2 (Vi - V) 

i j i i 

+ S2 -Xf-Xj+X)(Vii-Vi-yf+y). (25) 
i i 

Denoting the various sums in this equation by C", (7i, C' we may 
write it briefly 

0" = (7i + C'2 + C7'. 

On the assumption that the pairs of values constitute a random 
sample from a homogeneous population, the expected value of the 
first member of (26) is (N - l)/4ii, where is the covariance m the 



284 Analysis of Covariance [xi 

population. That of the first sum in the second member has been 
shown to be (A—l)/^ii, and similarly that of the second sum is 
(i— Consequently the expectation of the final sum is 

On dividing the various sums by (iV^—l), (A—1), (A?—1) and 
1) (A; — 1) respectively, we thus obtain four unbiased estimates 
of the covariance of the homogeneous population, with d.f. repre¬ 
sented by these divisors. In § 100 we obtained the corresponding 
estimates of the variance of a:, and similar equations give estimates 
of the variance of y in the population. From the partial sums we 
obtain three independent estimates of the coefficients of regression 
and correlation. Denoting the sums of squares of the x'q by A", 

A^, A* and those of the y\ by B^t B\ we may tabulate 

the results: 


Source of 
variation 

D.V. 

Sums of 
squares 

Sum of 
pro¬ 
ducts 

Coefficient of 
regression 

(*’) 

(»•) 

Between classes 




Cl 

h^CJA^ 

Between groups 

Jk-1 


Bt 

0. 


Error 

1 

1—• 

1 

A' 

B' 

O' 

b'^C^IA' 

Total 


A'^ 

B" 

C' 

— 


The first step is to test the significance of the regression coefficient 
V obtained from the error sums; and this is done exactly as in § 107, 
except that the number of d.f. for error is here (4 — 1) (& — 1). Thus 
the residual sum of squares due to error is J5' — C'^jA*, with N^h — k 
D.F., giving the estimate of variance of y 

a* = (B'-C^IA')l{N-h-k). 

With this we compare the estimate C'^jA* obtained from the regres¬ 
sion sum of squares corresponding to 1 d.f., and the result settles 
whether V is significant. If it is, we proceed to test the significance of 
the differences of the means of y in classes (or groups) after these have 
been corrected for regressioq on x. This may be done as follows. Take 
the first and third rows of the above table, and by addition form a 





108] Two Criteria 285 

combined source of variation, S = (classes + error), with sums of 
squares and products A, By C, Thus we have: 


Source of 
variation 

D.r. 

Sums of squares 
and products 

Classes 

h-l 


Aly Cl 

Error 



A\ B\ O' 

Total. 8 

N-k 

Ay By G 


where A + etc. From these we form the corresponding 
residual sums of squares as in § 107, viz. 


Source of 
variation 

D.P. 

Residual sum 
of squares 

Classes 

Error 

8 

A-2 

N-^h-k 

N^k-l 

B'-C'^JA' 

B-C^IA 


Subtracting the second row from the third we obtain a residual 
sum of squares + — C^jA corresponding to A— 1 d.f. 

Comparing the estimate of variance obtained from this with the 
estimate we are able to decide whether the class means differ 
significantly after correction for regression. 

Incidentally, we may deduce a test for the significance of the 
difference — 6'. For, on subtracting the residual sums of squares 
for classes and error from that for S, we have the sum 


CyA^+G'^IA'-C^IA 

with 1 D.F, As in § 107 this sum is expressible as 
A^A’{b^-b'fl{A^+A'y. 

SO that in testing it we are testing the difference - b\ Comparing 
it with the residual sum of squares for error we have a variance ratio 




(26) 


= 1 , — — 

We thus determine whether b^ is significantly differeilt from 6'; 
and we may test b^ by a similar formula.* 

* For a numerical illustration see Fx. xi, 15, below. » 





288 


[XI 


Analysis of Variance 

COLLATEKAL READING 

Fisher, 1938, 2, chapters vii and viii. 

WiSHART, 1934, 3; 1936, 2; 1940, 6. 

WiSHART and Sanders, 1930, 3. 

Tippett, 1931, 2, chapters vi, vii, ix, x, xi. 
Rider, 1939, 6, chapters viii and ix. 

Goulden, 1939, 2, chapters xi, xii, xiti and xv. 
Mills, 1938, I, chapter xv and pp. 681-90. 
Snedecor, 1934, 2; 1938, 3, chapters x and xi. 
Yates, 1934, 5 ; 1937, 4 and 1938, 4. 

Irwin, 1931, 1 and 1934, 4. 


EXAMPLES XI 

1 . Verify as follows that the expected value of the final sum in 
§ 97 (3) is (A — l)(r2. If ji is the mean of the population, 

S ni(Xi - «)* = S «<(*< -nf-N{x- nf. 

i < 

« 

Since E(Xi—fiY is the variance of the mean of a sample of members, 
and E{x - fi)^ is that of the mean of a sample of N members, it follows 
that P* 

= (A—l)cr2. 

Verify the mean value of 2 2 (% -- similarly. 
i i 

2. With the notation of §100, = x^j — Xi — Xj + x, show that 

2 = 0 = 2 Thus the sum of the X’s in any row or in any 

i i 

column is zero. Only (^-l)(fc-l) of the quantities are in¬ 
dependent. 

3. To test the significance of the variation of the retail price of a 
certain commodity among four large cities, A, B, C and D, seven 
shops were chosen at random in each city, and the prices observed 
were as follows: 

A: 05. lOd., 05. Id., Qs. Id., 5s. 9d., 5s. dd., 5s. 3d., 5s. Id. 

B: 7s., 6s. lOd., 6s. 8d., 6s. 7d., 6s. 4d., 5s. Sd., 5s. 2d. 

G: 7s. 4id., 7s., 6s. Sd., 5s. Sd., 5s. Sd., 5s. 6d., 5s. 6d. 

D: 68. 7d., 6s. 5d., 68. 4d., 6s. 2d., 6s., 5s. Sd., 5s. 4d. 

Do the data indicate that ^he prices for the four cities are signi¬ 
ficantly diflFerent? 



Examples 287 

The classes correspond to the cities. Expressing the prices in 
pence show that the tabulated results are: 


Source of variation 

D.F. 

Sum of squares 

Mean square 

Between cities 

3 

94*96 

31*66 

Within cities 

24 

1446*00 

60*25 

Total 

27 

1540*96 



Thus the mean square between cities is less than that within cities, 
and is therefore not significantly large. Neither is it significantly 
small because, for — 24 and I'g = 3, a value of F in the neighbour¬ 
hood of 2 is not significant. 

Show that the s.e. of the difference of the means for two cities is 
4* 16cZ.; and hence for no two cities is the difference of the mean prices 
significant. 

4. SJiow that, if and arc the means of two classes of and 

members respectively, and^^ is the estimate of population variance 

derived from error and corresponding to v d.f., then 5-^(1/rii-j-1/rig) 
is the S.E. of the difference of the means of the two classes, and 
hence that the statistic . 

t — 

«V(V»i+V™*) 

is distributed like ^ for p d .f. This formula provides a method of testing 
the significance of the difference of the means of two unequal classes. 

5. In a certain large country, to test the variation in the price 
of a certain commodity with district and season, six districts were 
chosen at random, and the price was observed in each on six random 
occasions throughout the year. The observations are recordedin pence 
in the table below, in which columns correspond to seasons and rows 
to districts. Discuss the significance of the variation with season and 

19 21 17 

18 22 16 

17 20 15 

18 21 13 

16 23 14 

20 22 16 


with district. 


11 15 24 

12 16 25 

10 13 21 

9 14 18 

13 16 20 

14 17 23 




288 Analysis of Variance 

Show that analysis of variance leads to the following results: 


[XI 


Source of 
variation 

D.F. 

Sum of 
squares 

Mean square 

F 

Seasons 

6 

492-33 

98-46 

66-5** 

Districts 

5 

44-33 

8-86 

B-0** 

Error 

25 

44-33 

1-77 

— 

Total 

35 

681-00 

— 

— 


Both the variance ratios are highly significant. Show that the s.e. 
of the difiFerence of the means for two seasons or two districts is 
0-77; and deduce that the prices in the first, second and last districts 
do not differ significantly. 

6 . An agricultural experiment, on the Latin square plan, gave 
the following results for the yield of wheat per acre, letters corre¬ 
sponding to varieties, columns to treatments and rows to, blocks. 
Discuss the variation of yield with each of the factors: 


A 

16 

B 

10 

c 

11 

D 

9 

E 

9 

E 

10 

G 

9 

A 

14 

B 

12 

D 

11 

B 

15 

D 

8 

E 

8 

G 

10 

A 

18 

D 

12 

E 

6 

B 

13 

A 

13 

G 

12 

C 

13 

A 

11 

D 

10 

E 

7 

B 

14 


Show that the results obtained by analysis of variance are: 


Source of 
variation 

D.7. 

Sum of 
squares 

Mean square 

F 

Treatments 

4 

66-56 

16-64 

37-8** 

Blocks 

4 

• 2-16 

0-54 « 

1-2 

Varieties 

4 

122-66 

30-64 

eo-e** 

Error 

12 

5-28 

0-44 


Total 

24 

196-66 

— 

— 


Show also that the s.b. of the difference of the means of two 
classes is 0*42; and hence that for no two blocks do the means differ 
significantly. 







Examples 289 

7. A random sample of 100 pairs of values from a normal popula¬ 
tion, grouped in 10 arrays of y’s, gave a correlation ratio of y on a; 
equal to 0-3. Show that this value is not significant of association 
between the variables. 

8 . A random sample of 160 pairs from a bivariate normal popida- 
tion when grouped in 15 arrays of y*s gave values r = 0*4 and 
rjjf — 0-5, Show that these results are consistent with the assump¬ 
tion of linearity of regression of y on aj. 

9. A parabola, fitted to a random sample of 45 pairs of values 
from a normal population, gave an index of correlation i? = 0-3. 
Show that this value is not significant of parabolic regression. Also 
show that it would not be significant for a sample of less than 
83 pairs. 

10. Give a direct proof that the expected value of the last sum 
in §105 (17) is (A-l)/4ii. 

11. Give a direct proof of the formula § 106 (21) for C. 

12. In the example of § 107, show that the sum of squares repre¬ 
sented by 5 — which corresponds to 4 n.F., is highly significant. 

Hence the deviations of the means of districts from the line of 
regression fitted to them are too great to be attributed to chance. 

13. Supply the details of the proof of § 108 (25). 

14. With the notation of § 107 the difference of the means of the 
y’s for the pth and ^-th classes, when corrected for regression, is 

(i) 

which consists of two independent parts. The estimated variance of 
the first part is 25 ^/^;, where k is the number in each class, and 

is the error mean square (JB' — G'7A')/(-^ —A — 1) in the analysis 
of residual variance. The estimated variance of V is Hence 

show that the estimated variance of the difference (i) is 

W + (ii) 

The quotient of the difference (i) by the square root of the expression 
(ii) is distributed like t with N — h—\ d.f. (Cf. Wishart, 1936, 2; 
also Wishart and Sanders, 1936, 3, pp. 63-4.) ^ 



240 


Analysis of Covariance [xi 

15. Examine, as in § 108, the covariation between the yields of 
grain and straw, as indicated by the following data.* Each of 6 
blocks was divided into 5 plots, and 6 different treatments (A,B, 

E) were distributed at random among the plots of each block. The 
columns correspond to blocks, and the yields of grain and straw are 
denoted by x and y respectively: 



X 

y 

X 

y 

X 

y 

X 

y 

X 

y 

A: 

65 

32 

68 

26 

65 

32 

71 

26 

62 

33 

B: 

75 

38 

54 

20 

71 

32 

71 

28 

64 

29 

0: 

72 

33 

69 

30 

69 

38 

69 

30 

61 

30 

D: 

70 

29 

69 

27 

72 

26 

70 

24 

70 

31 

E-. 

70 

37 

67 

29 

52 

33 

66 

23 

66 

40 


The factors of classification are blocks and treatments. Diminish¬ 
ing all the x^s by 66 and all the i/’s by 26, verify that 

Ti==22, -3, -1, 17, -7; T = 28 . 

T[ = 39, 2, 31, 1, 33; T' = 106, 

and calculate the values of 2} and Tj, Hence show that the sums of 
squares and products are given by: 


Source of 

D.F. 

Sum of squares 

Sum of 

Goeflficient 

variation 

(x^) 

(!/") 

products 

of regression 

Blocks 

4 

135 04 

265-76 

2-68 

0-020 

Treatments 

4 

98-24 

87-36 

-64-12 

-0-653 

Error 

16 

455-36 

211-44 

174-72 

0-384 

Total 

24 

688-64 

564-56 

113-28 

— 


The estimate obtained from the residual sum of squares for 

error IS B'-C'VA'- 144-34 „ 

S* = -rr—T—— = = 9‘62. 


N-h-k 


15 


To test the significance of 6' we compare with s^ the estimate 
C'^/A' = 67-1 corresponding to 1 d.f. The variance ratio is 

F = 67-1/9-62 = 6-9 (j^i = 1, = l^)* 

♦ Data adapted from Fishe^ and Eden, Joum, Agric, Sci,, vol. 17 (1927), 
p. 648. 




Examples 241 

Thus the value of b' is decidedly significant of positive correlation 
between x and y, apart from the factors of classification. To test 
whether the yield of straw, corrected for yield of grain, varies 
significantly with the treatment, we calculate 

(7 = (72+ (7' = 110 * 6 , A = A^ + A' = 653*6 

and compare the estimate of variance {B 2 'hC'^IA' — C^/A)l{k-l) 
based on 4 d.f. with based on 16. The former has the value 

(87*36 +67*1-22*l)/4 = 33*09, 

and the latter is 9*62. The variance ratio is 

F = 33*09/9*62 = 3*44 

which, for = 4 and 1 ^ 2 = 15, is significant at the 6 % level. We 
conclude that the yield of straw, after correction for yield of grain, 
does vary significantly with the treatment. 

Show that the difference 5' — 62 , tested by § 108 (26), is decidedly 
significant. 


WMS 


x6 



CHAPTER Xn 


MULTIVARIATE DISTRIBUTIONS. 
PARTIAL AND MULTIPLE CORRELATIONS 

109. Introductory. Yule’s notation 

In considering a bivariate distribution we saw that, when it is 
known that the values of one variable are influenced by those of 
another, the coefficient of correlation provides a useful measure of 
the degree of association between them. But it often happens that 
the values of a variable are influenced by those of several others. 
It is known,, for instance, that the statures of men are influenced 
by those of their ancestors; and the yield of grain is affected by the 
amounts of different fertilizers used. In such cases our data usually 
constitute a distribution of values of several variables. If we are 
concerned with the combined influence of a group of variables upon 
a variable not included in that group, our study is that of multiple 
regression and multiple correlation. If, however, we wish to examine 
the effect of one variable upon a second, after eliminating the effects 
of other variables, our problem is that of partial correlation. The 
analysis involved is rendered simple and compact by a notation due 
to Yule,* which has gained a fairly wide acceptance in recent years. 
Before applying this to the more general case of several variables, 
we shall pave the way for the student by reviewing the results 
obtained for two variables in Chapter iv, and expressing them in 
Yule’s notation. 

Let the two variables, measured from their means, be and x^, 
with standard deviations <r^ and o-g. If the lines of regression of 
0 ?! on ajjj ^-nd of on Xj, are 

“ ^ 12 ^ 2 > ^2 “ ^ 21 * 1 > (^) 

respectively, the residuals, or errors of estimate of the variables 
incurred by using these equations, are expressed by 

®1.2 ~ ^l”“^12^2> ^2,1^ ^i — l>2iXi, (2) 


• Yule, 1907, 1. 



243 


109] Yule's Notation 

These residuals are the deviations of the representative points from 
the corresponding line of regression. The values of the coefficients 
of regression, 6^2 and ftgi* are obtained by minimizing the sums of 
squares of the residuals; and this is done by equating to zero the 
partial derivatives of these sums with respect to 612 and 6211 and the 
constant terms. This leads to the normal equations 

2(^1 ^12^2) ~ —^12^2) ~ ^ (^) 

in the one case, and 

S (^2- ^ 21 ^ 1 ) = 0, 2 - h^^x^) - 0 (3') 

in the other. The first equation in each case expresses the fact that 
the mean of the distribution is on the line of regression; and, with 
the mean as origin, the equations of these lines have the simple 
form ( 1 ). Tlie above normal equations, expressed in the more 
compact notation, are 

S^l.2 ~ 9, S ^2^1.2 ~ 9 (3) 

and 2 ^ 2.1 = 9, (3') 

respectively, the summation including all pairs of values of the 
distribution. It will be observed that, whereas in Chapter rv 
subscripts were used to distinguish the pairs of values in the distribu- 
tion, they are here used to distinguish the different variables. They 
are no longer needed for the former purpose, since 2 will throughout 
denote summation over the whole distribution. 

From (3) and (3') we have the familiar values of the coefficients 
of regression, 

61a = S E hi = 2 xixj 2 a;?. 

The coefficient of correlation, which we denote by ri 2 or rgi, is 
such that 

^12 ~ ^i 2 ^ai> (^) 

and bi 2 ~ ^ia^i/^a> ^21 ~ (^) 

fja having the same sign as 6^2 or 621 , which is the sign of 2 ^ 1 ^ 2 - It 
should be remembered that, although =* fai* the values of 6 ia 
and 6 ai ^0 in general different. Lastly the mean squares of the 



244 Multivariate Distributions [xii 

deviations ( 2 ) are denoted by ^ and cr| ^ respectively. In virtue 
of § 26 ( 21 ) and ( 23 ), these are given by 

1 

0 - 1.1 = * 1.1 = 0 - 1(1 

The quantities cr^ ^ and (rg.i are referred to as standard deviations of 
the first order, the order being the number of subscripts following the 
point. These are called secondary subscripts, while primary subscripts 
are those preceding the point. The standard deviations 0*1 and 0*2 are of 
zero order. The residuals 2 ^2.1 deviations of the first order. 

The regression of on X2 is said to be linear, if the mean of each 
array of aj^’s is on the line of regression. 


110. Distribution of three or more variables 

Consider next a distribution of three variables which, measured 

from their means, are Xi, X2 and ajj. The data consist of N sets of 

corresponding values of the three variables. We enquire first as to 

the best estimate of x^ that can be obtained from the data in the 

form of a linear function of 0:2 and ajj. The regression equation is 

thus of the form . , . , 

a?! — <^ 4 -612.8^2 "h ^ 18 . 2 ^ 8 ’ 


The ‘best* estimate is interpreted in accordance with the principle 
of least squares, the constants being chosen so as to make the sum 
of the squares of the errors of estimate a minimum. The subscripts 
to the coefficients are written down on the following principle. The 
first is that of the variable for which the estimate is being found, 
and the second is that of the variable which the coefficient multiplies. 
These are primary subscripts. Separated from them by a point are 
the subscripts of the other variables that enter into the equation. 
These are secondary subscripts, and their number determines the 
order of the regression coefficient. In the above equation the 
coefficients are partial regression coefficients of the first order. The 
significance of the term ‘partial* will be explained shortly. The 
constant a has been given no subscripts, because it wiU appear 
immediately that its value is zero. 



110] Coefficients of Regression 245 

The sum of the squares of the residuals to be minimized is 

2 (^ 1 “”®"” ^ 12 . 8^2 “"^ 18 .2^3)^* 

Equating to zero its partial derivatives with respect to a and the 
6*s we have the normal eqvxilions 


S(a;i-a-6i2.aa;2-6i3,aa;3) = 0, 

-a-612.33^2-^i 3 . 2 a^ 3 ) = 0, 

2 a; 3 (a:i~a- 6 ,2.32^2-"^13.2^3) = 0 . 

Since the mean of each variable is zero, the first of these equations 
gives a = 0; and the regression equation of on and x^ is simply 

a?! = 612 33^2 "h ^13.2^3* (^) 

The other two normal equations may then be written more con- 

S* 2 a'l .23 = 0 =S.^ 3 ^-l.it 3 . (8) 

in which defined by 

a?! 23 = “"^12.3^2 ""^13.2^8» (^) 

is the residual, or error of estimate of x^ from the regression equation 
( 7 ). Its mean is zero, since those of the other variables are zero. 
Hence the s.d. of this residual, denoted by ctj 23 , is given by 

i ^^!.23 = S^?. 23 , (10) 


the summation covering the whole distribution, whose total 
frequency is N, This is a s.d. of order two, since it has two secondary 
subscripts. 

Similarly, we have the regression equation of 0:2 on x^ and a^g, 
namely, 


X^ — ^21.3^1"^ ^ 23.1 ^ 3 > 


( 11 ) 


for which the error of estimate is 

^2.13 = ^2""^21.3-'^l“^23.1^3» (12) 

and the normal equations 


2^1^2.13 — 0 — 2j ^3^2.13* (1^) 

There are also similar equations for thf regression of x^ on x^ and 
In general the coefficients 613.3 ^21.3 different. , 



246 Multivariate Distributions [xii 

The normal equations ( 3 ), (8) and ( 13 ) express that the sum of the 
products of correspoTiding values of a variable and a residual is zero, 
when the subscript of the variable is included among the secondary 
subscripts of the residual, the summation covering the whole dis¬ 
tribution. Further, if ajj g is the residual defined by (2), we have 

2^1.23^1.2 S^1.23(^l“^12^2) ~ 2^1.23^1' 

Similarly, 

2^1.23^1.23 =“ 2^1.23(^1“"^12.8^2”^13.2^3) = 2^1.23^1 

in virtue of (8). Thus it is evident that the sum of the products of two 
residuals is unaltered by omitting from one of the factors any secondary 
subscripts which are common to both. In virtue of this result and the 
normal equations it follows tliat the sum of the products of two residuals 
is zero if all the subscripts of the one are included among the secondary 
subscripts of the other. 

The argument of this section is also applicable to the case of n 
variables x^, ajj, • • •, regression equation of on ajg,..., x^, 

which corresponds to ( 7 ), the coefficients have each n — 2 secondary 
subscripts, and are therefore of that order. The residual Xi 2z...n 
corresponding to ( 9 ) is of order n— 1 , as is also its s.d. o'i.23.^n* 
The normal equations are 

2 a?<a^i. 23 ...n = 0 (» = 2 , 3 , ,.,,n). 

And the theorem concerning the omission of common secondary 
subscripts is equally valid in the case of n variables. 

111. Determination of the coefficients of regression 

The regression equation ( 7 ) may be interpreted as representing a 
plane, called the plane of regression of x^ on x^ and 0^3. The normal 
equations (8) may be expressed 

^1^2^12 ■”^12.8^2*“^13.2^3^2^88 ~ 

^ 1 ^ 8^18 ““^ 12 . 8 ^ 2 ^ 3^28 ""^ 18 . 2^3 == 

where fja is the correlation between x^ and x^, obtained by ignoring 
the values of x^. This is called the total correlation of x^ and x^- 
Similarly, fas is the total coirelation of x^ and X3, and so on. These 




Ill] Regression Equation 247 

coefficients are symmetrical in the subscripts. We may eliminate * 
the 6’s between (7) and (14) by equating to zero the determinant of 
their coefficients and the remaining terms. Dividing the second 
and third rows of this determinant by cr^ and cr^ respectively, and 
then the first, second and third columns by cr^ and (Tg respec¬ 
tively, we obtain the regression equation of on the other variables 
in the form 








0^2 




^12 

1 

^32 

= 0. 

(16) 

^13 

^23 

1 1 

1 



If then (i) is used to denote the determinant 


6> = 


^12 


'32 


(16) 


and (i)ij is the cofactor of the element in the tth column and the 
jth row, we may write (15) as 


Ui CTg ^3 


(17) 


From this form of the regression equation it follows that 


6 


12.3 ““ 


(Tg (Oil 


^18.2 ~ 


cr^ (Oil 


(18) 


The regression of Zi on and is said to be hnear if the mean of 
each array of x^’s lies on the plane of regression (17). 

The residual Xi may be looked upon as the deviation of the 
representative point from the plane of regression. The term deviation 
is therefore often used in place of residual. The variance 
Xi 2 z may be expressed in terms of (y, (On and (Ti, For 

“^^1.23 “ S^l.23 ~ ^12.3^2 “^18.2^3)‘ 

This is equivalent to 

(^f““ ^1.23) ""^12.3^1^2^21 “^13.2^1^3^31 “ » 



248 Multivariate Distributions [xn 

and, on eliminating the 6’s between this equation and (14), we have 
a result which may be expressed 


or 



^21 

^81 

^12 

1 

^32 

^18 

^23 

1 




Consequently 


o-J.gj = <ao-?/wii. 


(19) 


The above argument clearly holds for the more general case of 
n variables. In place of (16) we then have a determinant (o of order 
n; and the regression equation of on ajg,is 


(Ji 0-2 Un ‘ 

From this we obtain, as in (18), the regression coefficients of order 
71 — 2. In place of (19) we have an equation giving the variance of 
the deviation Xi 2 s...ni namely, 

<^1.23...n=<«<^lKl- (21) 

Example, Let Xy, a?,, a;, in. be the excesses of the heights of father, mother 
and son respectively above their mean values. A distribution of these variables 
gave the following approximate correlations and standard deviations:* 

fj, = 0-28, = 0*49, r,i = 0-61, 

O’! = 2-7, (Tj = 2*4, O’, = 2*7. 

Show that the regression equation of a;, on x^ and a;, is 

= 0-40a?i + 0-42a;2, 

and deduce that, if the mean heights of father, mother and son are 67*68, 
62*48 and 68*65 in. respectively, the regression equation for actual heights 
Xi, X,, X, is 

X, = 15*34-0*40Xi + 0*42X,. 

Also show that o’,.!, = 2*1. 

* Modified data from Pearson and Lee, Biometrika, voi. 2 (1903), pp. 
367-46?. 



112] Multiple Correlation 249 

112. Multiple correlation 

We proceed to find the correlation between the variable and 
its estimate from the regression equation (7). This correlation, 
which is an indication of the agreement between and its estimate, 
is called the coefficient of multiple correlation between x^ and the two 
variables and x^, and is denoted* by i 2 i( 28 )- oiay be determined 
as follows. If 23 estimate of x^ given by (7), we have 

^ 1.23 = ^ 12 . 3^2 + ^ 18 . 2^3 

or ^ 1.23 " ^ 1 . 23 * 

The mean value of 23 zero, since those of x^ and ajj 23 ^'^e both 
zero. The sum of the products of x^ and 23 i® 


Also 


2^1^1.23 “■ 2 ^l(*^l“”^1.23) 2^1.28 

= A((r2-(rf.23)- 

2 ^ 1.23 ~ 2 (^’ l ~^ 1 . 23 )^ ~ 2^1 “2 ^ 1.28 

= — (Tf 23 )* 


Consequently the coefficient of correlation between x^ and 61,28 i® 
given by 

p erf —Crj23 _ 1 //_2 rr2 \ 

A 1 ( 23 ) — 17 ^ 2 '^" -2 \ "" IT Vv^i ^1.23)- 

0^1 V (^3 “<^1.23) 0-1 

This is the required coefficient of multiple correlation. The result 
may be expressed 

^1.23 ~ — -^l(23))> (^*^) 

which is analogous to ( 6 ) and to §41 (46). Comparing (23) and (19) 
we see that 

1 — 22i(23) ~ W^ii> 

whence i??( 23 ) = 1 -(ojcon = 1 - w/(l - r\f) 

^12 + ^13 “ 




(26) 


which expresses the multiple correlation in terms of the total 
correlations between the pairs of variables. 


♦ Some writers prefer the notation 



250 Multivariate Distributions [xii 

We may note that jBi( 23 ) is never negative. For, from the above 
argument, = S®i. 23 > which cannot be negative. Further, 

when ^ 1 ( 28 ) = 1 it follows from (23) that af ga = 0, which requires 
all the deviations x^ 2 z i'® zero, so that is given accurately by 
the regression equation. In this case is a linear function of x^ 
and Xg. 

The argument is also valid for a distribution of n variables. The 
multiple correlation -Ri( 23 ...n) i® correlation between x^ and its 
estimate 28 ...« from the regression equation (20); and the argu¬ 
ment leads to the corresponding formula 

^1.23...n = ““ -^1(28...n)) ) 

and l-^i( 23 ...n) = W^ii- (24') 

If the multiple correlation is equal to unity, x^ is a linear function 
of a;a,...,a;^. 

Example 1. Prove the following relations: 

22 23 ^1.28 == 

= 22 "I" ^12.8 22 + ^13.2 S 

■® 1 ( 28 ) ~ (^12,3 2i ^ 1^2 + ^ 18.2 22 a?ia?8)/Sa?J. 

Example 2. Show that, for the example in the preceding section, 

■^8(12) ~ 0*63. 


113. Partial correlation 

Consider next the correlation between the deviations ajj j and 
aJa.a- Since is the deviation of ajj from its estimate in terms of 
a?8, we may regard it as that part of the variable x^ which remains 
after the influence of x^ has been eliminated, as far as can be done by 
a linear equation. A similar interpretation can be given to ajj.s- 
Hence the correlation between these deviations may be looked upon 
as the correlation between x^ and x^ after the influence of x^ has been 
eliminated. We denote this correlation by ^^® pc^^tial 

correlation between x^ and x^ in the trivariate distribution. Having 
one secondary subscript j is a partial correlation of the first 
order^ It will be remembered that, in calculating the total correla- 



118] Partial Correlation 261 

tion r^g, the values of are simply ignored. It is therefore a correla¬ 
tion calculated on the assumption that the variables Xi and x^ are 
influenced only by each other, and not by any other variable. 

To find the partial correlation we observe that, by the 
theorems of § 110 , 


0 — S ^2.8*1.23 "" 2 ^2.3(%“~^12.3^2“"^13.2^3) 

=3 S *1^2.3 "“^12.3 2 ^2^2.8 “ 2 ^1.3^2.3 “ ^12.8 2 *2.3> 

so that 6i2.8 = 2a?i.3^2.3/2 (26) 


From this result it follows that 612.3 is the coefficient of regression 
of cci 3 on 5 C 2 . 3 . Similarly, 621.3 is the coefficient of regression of ajg.s 
on a?! 3 ; and the coefficient of correlation between these deviations 

i.the„fo» given b, , 2 ,, 


Since cTi.j and cra .3 are the standard deviations of x^^ and x^ ^^ the 
coefficient of partial correlation is connected with the coefficient of 
partial regression by the usual formulae 


^12.8 ~ ^*12.3^1,3/^2. 3 > ^21.3 ^21.3^2.3/^1.8- ( 26 ) 

And it is now clear why the above 6 's are called coefficients of partial 
regression. The correlation r^g.a has the same sign as 612 . 3 , which is 
the sign of —by (18). Also, in virtue of (27), rig 3 is symmetrical 
in the primary subscripts. If the values of the 6 's are substituted 
from (18) in (27) we obtain 


"i2"2_i ^ JHh _ 
^ 11^22 ^ 11 ^ 22 ’ 


so that 


— _ ^12 "“^ 13 ^23_ 

* V(^ii^ 22 ) V(( 1 ^13) (i ~ ^'23)} 


(29) 


Similar formulae may be written down for fjg 2 and r 23 .i. The partial 
correlations of the first order are thus expressible in terms of the 
total correlation coefficients. 

The partial correlation between Xi and X 2 in the trivariate dis¬ 
tribution is sometimes defined as the correlation between Xj and Xg 
for a constant value of Xg. In general, however, this correlation will 
depend upon the constant value of Xg selected. In certain fecial 



252 Multivariate Distributions [xii 

cases the s^pcond definition agrees with the first, and the result is 
then the same for all constant values of x^. Necessary and sufficient 
conditions* for this agreement may be stated: 

(a) In the bivariate distribution of and {x^ being ignored), 
the regression of x^ on x^ must be linear, and the standard de¬ 
viations of all the iCi arrays {x^ = const.) must be equal. 

(b) In the trivariate distribution the regression of x^ on x^ and 
ajj must be linear, and the standard deviations of all the x^ arrays 
(a ;2 = const., x^ = const.) must be equal. 

These conditions are satisfied in the normal trivariate distribution 
to be considered in § 116. 

For a distribution of n variables there are partial correlations of 
all orders from 1 to n —2. Thus if (k) denotes a definite group of 
secondary subscripts not including i and j, the correlation between 
the deviations and Xj (j^y denoted by is a partial correla¬ 
tion of order equal to the number m of subscripts in (i). It is con¬ 
nected with the regression coefficients for these deviations, and their 
standard deviations and (Ty.(jk), by formulae analogous to (27) and 

(28),lamely, ’ (3„) 

“ Uj.ik)^iAk)l^jAk>* (31) 

If (/b) includes all the subscripts but i and^*, we find as in (29) 

UjAk) = 

the symbols on the right being cofactors in the determinant (o of 
order n, 

A partial correlation of order m -f 1 is expressible in terms of 
those of order m by an equation of the same form as (29), with a set 
{k) of secondary subscripts added to each coefficient in the formula. 
Thus 

„ _ Uj.ik)~-Uh.ik)'^ j hAk> / 09 \ 

where h, i, j are unequal. 

Example. Prove that, for the example in § 111, 

Ui.t = 0-446, = 0-42. 


♦ For a proof see Ccunp, 1934, 1, pp. 341-2. 



114] Reduction Formula 263 

114. Reduction formula for the order of a standard deviation 
A s.D. of any order may be expressed in terms of a s.d. and a 
correlation coefficient of lower order. Thus, since 

S^l.23 ~ ^12.3^2"■^ 13 . 2 ^ 3 ) 

~ 2 ^ 1.2 “"^ 13.2 2 ^ 1 . 2 ^ 3 . 2 > 

we have, on dividing by N and using (26) with subscripts inter- 

^ * ^ 1.23 == ^1.2(1 “^13.2^31.2) == ^1.2(1 "“^ 13 . 2 )» 

which is of the same form as ( 6 ). Since erf 23 is symmetrical in the 
secondary subscripts, the subscripts 2 and 3 may be interchanged 
' in the second member of (33). Substituting the value of erf g given 
by ( 6 ) we may also write the result 

^1.23 “ “^ 12 ) 

Similar formulae may bo written down by cychc permutation of the 
subscripts. The equations (33) and (34) show how a s.d. may be 
expressed in terms of one of lower order, and one or more of the 
correlation coefficients. Also from (33) it follows that o'i. 23 ^^i. 2 - 
The estimate of from and x^ is thus in general better than the 
estimate from x^ alone, being just as good only if rjg 2 is zero. 
Further, on comparing (34) with (23) we see that 

l-i2!(23)=(l-^!2)(l-»*!3.2), (35) 

which is in agreement with the values already found for 72j(23) and 
ri 3 2 in terms of the total correlations. From (35) it is clear that 

1 — -^1(23) ^ 1 “ ^ 12 > 

so that •^ 1 ( 23 ) ^ ^ 12 * 

Since i^Kgs) is symmetrical in the subscripts 2 and 3, it follows that 
this coefficient of multiple correlation is not numerically less than 
either or rig. If then is zero, both and rig must be zero, 
and a 5 i is then uncorrelated with either x^ or x^. 

The same argument shows that, in the case of n variables, the 
equation (33) has the generalization 

• 


( 36 ) 



264 Multivariate Distributions [xii 

where {k) denotes as usual a group of secondary subscripts. Repeated 
application of this formula leads to a generahzation of (34), namely, 

Comparing this with (23') we see that 

1 ~ ^!( 23 ...«) = (1 “ ^ 2 ) (1 ~^? 3 . 2 ) ... (1 ( 38 ) 

If -Ri( 28 ...n) = ^ each of the correlations in the second member is 
zero, and also each of the coefficients rig* ^i 3 > • • •> ^m* Thus Zi is then 
uncorrelated with any of the other variables. 


115. Reduction formula for the order of a regression coefficient 

To express a coefficient of regression in terms of coefficients of 
lower order we may proceed as follows. First 


S ^1.3^2.8 — S ^l(**'2 "" ^23^3)* 

Then, since 623 — we may write the above equation, in 

virtue of (26), , . i 

^ 2 . 8 ^ 12.8 ^ ^ 12^2 ® 13 ^ 3 *^ 32 ^ 2 /^ 8 > 

or, by means of ( 6 ), 

\ — ^23) 612,8 ~ ^2(^12 ““^13^32)' 


Thus we have the required reduction formula 


6 


12.8 ~ 


^12 ""^1 3^32 
1 ” ^23^88 


(39) 


This may be expressed in terms of correlations. For, in virtue of 
(28), it is equivalent to 


^ 12.3 


^ 1.3 

^ 2.3 



(40) 


Substituting the values of 0 * 1,3 given by equations of the 

form ( 6 ), we find (29) again. 

In the case of n variables, by similar reasoning, we arrive at the 
generalization of (39), 

I, _ ^12,0fc)*“^13.(fc)^82.a:) 

®i 2 . 8 (*)"" 'I I r » 

^ ““®28.(A:)^82.(*) 

where (k) has the usual significance. 



115,116] Normal Distribution 255 

116. Normal distribution 

A generalization of the bivariate normal distribution to the case 
of three variables may be obtained as follows.* First suppose that 
the variables and are normally distributed about zero means 
with correlation Then the probability that a pair of values chosen 
at random will fall in the interval dx^dx^ is 




dxodx^ r 1 (xl x\ ^ x^xX" 

1_~2wi, ’ 


where = 1 — Next assume that the regression of x^ on x^ 
and x^ is linear, and that in each x^ array the variable is normally 
distributed with s.n. which is the same for each of these arrays. 
Then, since the mean of each array is on the plane of regression (7), 
the s.D. of each array is 0 * 123 » given by 


"■i-iia = o'lWMi. (41) 

Consequently the probability that a?i, chosen at random in an 
assigned array, will fall in the interval dx^ is 


dP. 


dxi 


■exp(-ia:f.j83/(r?,i83) 


Jou.dx. V ^ I 

cTi ^{27rw) 2wwii\ “(Ti cTg/_ 


by (41) and (17). On forming the product dP^dP^ we have the 
probability that a set of values of the three variables, chosen at 
random, will fall in the interval dx^dx^dx^, as 


dP = 


dx^dx^dx^ 

(Ti(T^(T^~^{{2nfo)} 


exp(-i^), 


where, after a little reduction, it will be found that 


(42) 


. 1 / a;? x\ „ x~x~ „ x.x, „ x,x~\ 


(43) 


The symmetry of (42) and (43) in the subscripts shows that the 
properties of the three variables are similar. Thus all the regressions 


♦ Cf. Rietz, 1927, 2, pp. 106-7. 



266 Multivariate Distributions [xii 

are linear. The variance of each array of X2S in the trivariate 
distribution is o>(r|/o22 = <^2.i8» array of is 

(OaljO)^^ = ^^ 3 . 12 ' 

We may verify the statement made in § 113 that, in the present 
case, the correlation between Xi and X2 for a constant value of x^ 
is the partial correlation rjj.a- For ^ may be expressed in the form 

^ ~ o r ^ ^ 18 “■ ^ 23 ^a)* 

+^13^3) ^23^3)"] 

^1^2 J ^3 

and, on comparing ( 42 ) with § 38 ( 28 ), we see that, for a constant 
value of.a;3, Xi and X2 are normally distributed about means 6130:3 
and 6230:3 respectively, with correlation 

6^12 _ 

V(^11^22) 

as stated. From ( 42 ) and ( 44 ) it is also clear that the deviations 
0:1^3 and 0:3 3 are normally distributed with correlation ri2.3. 

The above results may be extended to the case of n variables.* 

117 , Significance of an observed partial correlation 
Fisher has proved that the sampling distribution of a partial 
correlation coefficientf of order k, in samples from a normal popula¬ 
tion, is of the same form as that of the correlation coefficient for 
samples from a bivariate normal population, with the sample 
number N reduced by k. In particular, when the partial correlation 
in the population is zero, the square of the partial correlation 
coefficient, r, in samples of N sets of values, is a 4 (iV —A: — 2)) 

variate. By the argument used in §89 it then follows that the 
statistic t defined by 

‘ = ( 46 ) 

conforms to the t distribution for (iV^ — A: — 2) d.p. We are thus able 
to test the significance of an observed partial correlation. 

♦ Cf. Yule and Kendall, 1937, 1, pp. 282-4. 


t Fisher, 1924, 4. 



117,118] Partial Correlation, 257 

Example 1. From a random sample of 21 sets of values from a normal 
population the calculated value of a partial correlation of order three is 0*40. 
Is this consistent with the assumption that the corresponding partial corre¬ 
lation in the population is zero ? 

In applying the above test to our assumption we have y = 21 — 3 — 2 = 16, 
and 

^^0»4V(16)^ 1-6 ^ 

V(0-84) 0-916 

This value of t is not significant at the 6 % level, so that the observed correla¬ 
tion is not significant of correlation in the population. 

From the sampling distribution it follows, as in the case of two 
variables, that if Fisher’s z transformation of § 94 is applied to the 
above partial correlation of order k, the statistic z is distributed 
nearly normally with variance ll(N — k — We are thus able to 
test if an observed partial correlation differs significantly from some 
assumed value. Similarly, we can test the significance of the 
diffei^ence of the observed partial correlations in two independent 
samples. 

Example 2. From independent samples, of 32 and 23 sets of values, 
partial correlations of order four are found to be 0-4 and 0-6 respectively. 
Examine (i) whether the first value is consistent with the assumption of a 
normal population with a corresponding correlation of 0-7, and (ii) whether 
the two samples may be regarded as from the same normal population. 

(i) From the Table 7 we find that the values = 0-4 and = 0*7 corre¬ 
spond to Zi = 0-424 and Zq = 0-868. The deviation of Zj is therefore 0-444. 
The s.E. of Zi is 1/V(32 — 4 — 3) = 0-2. Since the deviation of Zi is greater than 
twice the s.E. it is significant. The assumption of a correlation of 0-7 in the 
population is thus ruled out, 

(ii) Corresponding to = 0-4 and r, = 0-6 we have z^ = 0-424 and 
Zj = 0-693, giving z, —z^ = 0-27 nearly. The s.E. of the difference of the z’s 
is given by 

e = V(^+A) = V(0*1026) = 0-32. 

The difference Zj — z^, being less than e, is not significant. The samples may 
thus be regarded as from the same population. 

118. Significance of an observed multiple correlation 

Consider next the significance of an observed multiple correlation 
coefficient, R, of the variable with the p variables ajg, ..., 
calculated from a random sample of N sets of values from^ a multi¬ 
variate normal population. Fisher has found the general sampUng 


WMS 


17 



258 Multivariate Distributions [xii 


distribution’^ of R, and has shown that it depends, not on the whole 
matrix of correlations between the variables, but simply on the 
multiple correlation in the population and the sample size, N. 
In particular, when the multiple correlation coeflBicient in the 
population is zero,t is a \(N —p — 1)) variate. In virtue 

of Theorem vii of § 72 it follows that, in this case, R^l{\—E^) is a 
—p—\)) variate; and therefore, by the argument of § 92, 
the statistic F defined by 


R^ 


(46) 


conforms to the F distribution for 


Vi=^N-p-l. (47) 

To test the hypothesis that the multiple correlation in the population 
is zero, we have only to determine from the table of F whether 
the value of this statistic calculated from the sample is significant. 
This decides the significance of the observed multiple correlation. 
We may also remark that, since R^ is a fiiUp^ 1)) variate 

in samples from a normal population in which Xi is uncorrelated 
with any of the p variables ojg, x^^^, the mean value of R^ is 
given by 


The problem may also be approached from the point of view of 
analysis of variance and covariance. The estimate of Xj, given 
by the regression equation of Xi on the p variables is 

of the form 

®i.(p) == + ^18^3 + • • • + ^i(p+i)®p+i» (^®) 

and this is connected with the corresponding deviation x^ ^p) by 
the equation 

(49) 

* Fisher, 1928,1. See also Wilks, 1932, 1. 

* t See also Fishbr, 1924, 6. 



118] Multiple Correlation 259 

The N values of are connected by the jp +1 normal equations 

^ (» - 2, ...,jp+1). (60) 

Squaring (49) and summing over the distribution we see that 

S = S ^i.(p)+iC ^i.(p)> (^1) 

the sum of products vanishing since 

2*l.(3!>)®l.(p) = 2®l.(p)(^ia*2 + ^13^3+ •••) == 

in virtue of (50). The sum in the first member of (61) is equal to 

N8\y where 8\ is the variance of x-^ in the sample; while the first 

sum on the right has the value N8l{l — R^) by (23'). Consequently 

j:el^^y = NR^8l (62) 

The equations (60) may be regarded as p +1 linear constraints on 
the N values Xi ^j^), Then, on the assumption that these deviations 
are normally distributed, and that R is zero in the population, the 
sum is distributed like with N —p — 1 d.f., being the 

variance of x^ in the population. But, by §77, is similarly 

distributed with N—1 d.f.; and therefore, by Theorem II of §81, 
2 ®i.(p)/^^ is distributed like x^ with p d.f. Thus the two sums on 
the right of (51), when divided by iV^—p — 1 and p respectively, give 
independent and unbiased estimates of o’®. Inserting their values 
in terms of R^ we see that the quotient of these estimates 


1 —i 2 ® p 


(63) 


conforms to the F distribution for 


Vi=P, V2 = N-p-1. 

We thus arrive at the same result as before.* (See also Ex. xn, 13). 

Example, In a sample of 25 sets of values from a normal population 
^i(as 4 ) was found to be 0*4. Show that this is not significant of correlation 
in the population between and the variables Xg, Xg, X 4 . 

Here p = 3, I'l = 3, i', = 21. Hence 


„ 0*16 21 4 , 

F = ;r^ X ^ = r = 1*33. 
0*84 3 3 


This is not significant, since the 6 % value of F is about 3*1. 


* For the distributions of various statistics ooourring in this ch&pter see 
Bartlett, 1033, 3. • 


t7-a 



260 


Multivariate Distribuiiona 


COLLATERAL READING 
YtnuE, 1907, 1. 

Yulb and Kendall, 1937, 1, chapter xiv, 
Rietz, 1927, 2, pp. 92-102 and 100-7. 
Mills, 1938, 1, chapter xvi. 

Rider, 1939, 6, §§20, 28, 29, 45-8. 
Kenney, 1939, 3, pait u, chapter v. 
Ezekiel, 1930, 2, chapters x-xxiii. 
Goulden, 1939, 2, chapter viii. 

Camp, 1934, 1, part n, chapter vi. 
Snedecob, 1938, 3, chapter xm. 

Tippett, 1931, 2, chapter xi. 

Pearson, K., 1901, 1 and 1915, 1. 

Fisher, 1924, 4 and 1928, 1. 

Bartlett, 1933, 3, 

Wilks, 1932, 1. 


EXAMPLES XII 

1. In a trivariate distribution it is found that 

(Ti = 3, (Ta = 4, (Tg = 5; fga == ^3i = = 0-70. 

Prove that the partial correlations are 

^23.1 ~ —0*035, ^31 a — 0*49, ^ 12.3 = 0*63, 

and that, if the variates are measured from their means, the linear 
regression equations are 

= 0*41cCa 4* 0*23^3, =* 0*96iCj^—0*0253J3, 1*042;^ —O’OSaJa* 

Show also that 

^1.28 ~ cr 2 , 8 i = 2*85, CTg^^a ~ 4*00, 

■^1(23) ~ 0*78, 1^2(31) ~ 0*70, <^ 3 ( 12 ) ~ 0*60. 

2. Show that 

^1.23^2.31/^12.8 ~ ^(OCTiCr2/(012, 
and, more generally, 

^ ^ 1. 23 (ifc)^ 2 . 81 (W^ 12 . 3 (fc) “ “ (OCri<T2l(Oi2. 



Examples 261 

3. From § 114 (38), deduce that 

1 *“^1/1.23...= (1 “-^1(23...n))/(^ ~-^1(23...rT^) 

which shows how a partial correlation coefficient of order n — 2 may 
be expressed in terms of multiple correlation coefficients of orders 
n—l and n — 2 . 

4. Prove the identity 

^12.3^23.1^31.2 ~ ^12.3^23.1^31.a' 

5. Prove the formula 

^12 = (^12.3 + ^13.2^32.l)/(l ““^13.2^31.2)> 

expressing a regression coefficient in terms of coefficients of higher 
order. Write down the corresponding formula with subscripts 1 
and 2 interchanged, multiply together the two equations and take 
the s^are root, thus obtaining 

^12 = (^12.3 + ^13.2^23.l)/V{(^~^13.2)(l”*^23.l)} 

expressing a correlation coefficient in terms of coefficients of higher 
order. 

6 . Prove the formulae of Ex. 5 with a group (k) of secondary 
subscripts added to each of the coefficients. 

7. Verify the values of ^ expressed by § 116 (43) and (44). 

8 . Show that, if = axi + bx^, the three partial correlations are 
numerically equal to unity, ^ leaving the sign of a, rga.i the sign 
of 6 , and r^a.a the opposite sign to a/ 6 . 

9 . For a sample of 30 sets of values from a normal population, 

i?H 23 ) is found to be 0 * 6 . Show that this is significant of correlation 
in the population between x^ and ajg, x^. {F = 4*5, Vi = 2, = 27.) 

10. Show that a partial correlation ri 2.84 = ^ a sample of 

20 sets of values from a normal population, is significant at the 5 % 
level. 

11 . Two independent samples, of 46 and 36 sets of values, give 
corresponding partial correlations of order three as 0'41 and 0*66 



262 Multivariate Distributions 

respectively. Show that this is not inconsistent with the assumption 
that the samples are from the same normal population. Show also 
that an estimate of the partial correlation in the population, by the 
method of § 96, is 0*53. 

12. Show that, corresponding to the first sample in § 117, Ex. 2, 
the 95 % fiducial limits for the partial correlation in the population 
are 0*03 and 0*67. 

13. Deduce from the argument on p. 259 that, when the mul¬ 

tiple correlation in the normal population is zero, R^l{l—R^) 
is a 1)) variate and therefore, by §72, R^ is a 

1)) variate. 

14. With the notation of §118 show that, when the multiple 

correlation in the normal population is zero, the expected value of 
R in the sample is A +1)) — 1))/A Jl’) 



LITERATURE FOR REFERENCE 


1876, 1 

1876, 1 

1894, 1 

1896, 1 

1897, 1 

1898, 1 


1900, 1 

1901, 1 

1902, 1 

1903, 1 
1905, 1 

1907, 1 


. Helmert, F. R. tlber die Berechnung des wahrscheinlichen 
Fehlera aus einer endlichen Anzahl wahrer Beobaohtungsfehler. 
Zeit. fUr Math, und Physik, Bd. 20, S. 300-3. See also Astro- 
nomische Nachrichten, Bd. 85, no. 2039. 

. Helmert, F. R. Cber die Wahrscheinlichkeit der Potenzsummen 
der Beobacbtungsfehler, und iiber einige damit in Zusammen- 
hange stehende Fragen. Zeit. fur Math, und Phyaik, Bd. 21, 
S. 192-218. See also Astronomiache Nachrichten, Bd. 88, no. 
2096-7. 

. Pearson, Karl. Contributions to the mathematical theory of 
evolution. Phil. Trans. Roy. Soc. A, vol. 185, p. 71. 

. Pearson, Karl. Regression, heredity and panmixia. Phil. 
Trana. Roy. Soc. A, vol. 187, p. 253. 

. Yule, G. U. On the theory of correlation. Joum. Roy. Stat. Soc. 
vol. 60, p. 812. 

•. Pearson, Karl and Filon, L. N. G. On the probable errors of 
frequency constants, and on the influence of random selection 
on variation and correlation. Phil, Trana. Roy. Soc, A, vol. 191, 
p. 229. 

!. Sheppard, W. F. On the calculation of the most probable values 
of the frequency constants, from data arranged according to 
equidistant divisions of the scale. Proc. Lond, Math. Soc. 
vol. 29, p. 353. 

. Pearson, Karl. On the criterion that a given system of deviations 
from the probable, in the case of a correlated system of variables, 
is such that it can be reasonably supposed to have arisen from 
random sampling. Phil. Mag. vol. 60, pp. 157-75. 

. Pearson, Karl. On lines and planes of closest fit to systems of 
points in space. Phil. Mag. Series 6, vol. 2, p. 559. 

!. Pearson, Karl. On the systematic fitting of curves to observa¬ 
tions and measurements. Biometrika, vol. 1, p. 265 and vol. 2, 

p. 1. 

. Pearson, Karl. On the probable errors of frequency constcmts. 
Biometrika, vol. 2, pp. 273-81. 

. Thiele, T. N. Theory of Observations. London: Layton. 

. Pearson, Karl. On the General Theory of Skew Correlation and 
non-linear Regression. Drapers’ Co. Research Memoirs, Bio¬ 
metric Series ii. 

. Yule, G. U. On the theory of correlation for any number of 
variables, treated by a new systepa of notation. Proc. Rpy. Soc. 
A, vol. 79, pp. 182-93. 



264 Literature for Reference 

1908, 1. ‘Student’ (Gosset, W. S.)* On the probable error of a mean. 
Biometrika, vol. 6, pp. 1-26. 

2. ‘Student*. On the probable error of a correlation coefficient. 
Ibid, p. 302. 

1911, 1. Pearson, Karl. On the correction necessary for the correlation 
ratio. Biometrika^ vol. 8, p. 254. 

1912,1. Poincab6, H. Calcid dea Probabilitds. Paris: Gauthier-Villars. 

1913, 1. Pearson, Karl. On the probable errors of frequency constants. 
Biometrika, vol. 9, pp. 1-10. 

2. Pearson, Karl. On the influence of ‘broad categories* on corre¬ 

lation. Ibid, pp. 116-39. 

3. Harris, J. A. On the calculation of intra-class and inter-class 

coefficients of correlation from class-moments, when the number 
of possible combinations is large. Ibid, pp. 446-72. 

1915, 1. Pearson, Karl. On the partial correlation ratio. Proc, Roy, Soc. 
A, vol. 91, p. 492. 

2. Fisher, R. A. Frequency distribution of the values of the corre¬ 
lation coefficient in samples from an indefinitely large popula¬ 
tion. Biometrika, vol. 10, pp. 607-21. 

1920, 1. Pearson, Karl. On the fundamental problem of practical 

statistics. Biometrika, vol. 13, p. 1. 

2. Bowley, a. L. Elements of Statistics, Fourth edition. London. 

1921, 1. Fisher, R. A. On the ‘probable error’ of a coefficient of correla¬ 

tion deduced from a small sample. Metron, vol. 1, part 4, 
pp. 3-32. 

2. Pearson, Karl. On a general method of determining the succes¬ 
sive terms in a skew regression line. Biometrika, vol. 13, p. 296. 

1922, 1. Fisher, R. A. On the interpretation of from contingency 

tables, etc. Joum, Roy, Stat, Soc, vol. 85, pp. 87-94. 

2. Fisher, R. A. The goodness of fit of regression formulae, and the 

distribution of regression coefficients. Ibid, pp. 697-612. 

3. Pearson, Karl. On the x* goodness of fit. Biometrika, 

vol. 14, pp. 186 and 418. 

4. Miner, J. R. Tables of V(1 ^ for use in Partial Corre¬ 

lation and Trigonometry, Baltimore; Johns Hopkins Press. 

6. Pearson, Karl (ed.). Tables of the Incomplete Gamma Function. 
London: H.M. Stationery Office. 

1923, 1. Kelley, T. L. Statistical Method. New York: The MacMillan Co. 

1924, 1. Rietz, H. L. (ed.). A Handbook of Mathematical Statistics, Boston: 

Houghton Mifflin. 

2. Fisher, R. A. On the distribution yielding the error functions of 
several well known statistics. Proc, Intemat, Math, Cong, 
Toronto, vol. 2, pp. 805-13. 

3<r Jones, D. G. A First Course in Statistics, Second edition. London: 
Q. Bell and Sons. 



Literature for Reference 266 

1024, 4. Fisher, B). A. The distribution of the partial correlation coefficient. 
Metron, vol. 3, pp. 329-32. 

6. Fisher, R, A. The influence of rainfall on the yield of wheat at 
Rothamsted. Phil. Trans. Roy. Soc. B, vol. 213, pp. 89-142. 

1926, 1. Fisher, R. A. Applications of Student’s distribution. Metron, 

vol. 6, part 3, pp. 90-104. 

2. CooLiDQE, J. L. An IrUroduction to Mathematical Probability. 

Oxford; Clarendon Press. 

3. L6 vy, P. Calcul dea Probability. Paris. 

1927, 1. Burgess, R. W. Introduction to the Mathematica of Statistics. 

Boston: Houghton Mifflin. 

2. Rietz, H. L. Mathematical Statistics. Cams Mathematical Mono¬ 
graphs, no. 3. Chicago: Open Court. 

1928, 1. Fisher, R. A. The general sampling distribution of the multiple 

correlation coefficient. Proc. Roy. Soc. A, vol. 121, pp. 654-73. 

1929, 1. Fisher, R. A. Moments and product moments of sampling 

distributions. Proc. Land. Math. Soc. vol. 30, pp. 199-238. 

1. WiSHART, J. The correlation between product moments of any 

order in samples from a normal population. Proc. Roy. Soc. 
Edin. vol. 49, pp. 1-13. 

1930, 1. Rider, P. R. A survey of the theory of small samples. Annals of 

Math. vol. 31, pp. 677-628. 

2. Ezekiel, M. Methods of Correlation Analysis. New York: Wiley. 

3. Fisher, R. A. Inverse probability. Proc. Camb. Phil. Soc. vol. 26, 

pp. 628-36. 

1931, 1. Irwin, J. O. Mathematical theorems involved in analysis of 

variance. Journ. Roy. Slot. Soc. vol. 94, pp. 284-300. 

2. Tippett, L. H. C. The Methods of Statistics. London: Williams 

and Norgate. 

3. Nbyman, j. and Pearson, E. S. On the problem of k samples. 

Bull. Acad. Polonaise des Sciences et dea Lettrea, Series A, p. 460. 

4. Thiele, T. N. Theory of Observations. English version, reprinted 

in Annals of Math, Slot. vol. 2, p. 166. 

1932, 1. Wilks, S. S. On the sampling distribution of the multiple correla¬ 

tion coefficient. Annals of Math. Slot, vol. 3, pp. 196-203. 

1933, 1. Fisher, R. A. The concepts of inverse probability and fiducial 

probability referring to unknown parameters. Proc. Roy. Soc. 
A, vol. 139, pp. 343-8. 

2. Aitken, a. C. On the graduation of data by the orthogonal 

polynomials of least squares. Proc. Roy. Soc. Edin. vol. 63, 
pp. 64—78. 

3. Bartlett, M. S. On the theory *of statistical regressioh. Ibid. 

pp. 260-83. • 


will 


x8 



266 Literature for Reference 

1934» 1. Camp, B. H. The McUJienuUical Part of Elementary Statiatica. 
New York; Heath. 

2. Snedeoob, G. W. Calculation and Interpretation of Analyaia of 

Variance and Covariance* Iowa: Collegiate Press. 

3. WiSHABT, J. Statistics in agricultural research. Supp. to Joum. 

Roy* Stat* Soe* vol. 1, pp. 26-51. 

4. Ibwin, J. O. Independence of the constituent items in the analysis 

of variance. Ibid. pp. 236-61. 

5. Yates, F. The analysis of multiple cleussifications with unequal 

numbers in different cleisses. Joum* Amer. Stat. A aaoc.vol. 29, p. 66. 

6. CooHBAN, W. G. The distribution of quadratic forms in a normal 

system, etc. Proo* Comb* Phil* Soc* vol. 30, pp. 178-91. 

1935, 1. Fisheb, R. a. The mathematical distributions used in the common 

tests of significance. Econometrica, vol. 3, pp. 353-65. 

2. Fisher, R. A. The fiducial argument in statistical inference. 
Anndla of Eugenica, vol. 6, pp. 391-8. 

1936, 1. Levy, H. and Roth, L. Elementa of Probability. Oxford: Clarendon 

Press. 

2. WiSHART, J. Tests of significance in analysis of covariance. 

Supp. to Joum, Roy. Stat* Soc* vol. 3, pp. 79-82. ' 

3. WiSHART, J. and Sanders, H. G. Principlea and Practice of Field 

Experimentation* Empire Cotton-Growing Corporation. 

1937, 1. YxjiiB, G. U. and Kendald, M. G. Introduction to the Theory of 

Statiatica. Eleventh edition. liondon: Griffin and Co. 

2. Uspensky, J. V. Introduction to Mathematical Probability. New 

York: McGraw-Hill. 

3. Pitman, E. J. G. Significance tests which may be applied to 

samples from any population. Supp* to Joum* Roy* Stat* Soc, 
vol. 4, pp. 119-30 and 225-32. 

4. Yates, F. The Deaign and Analyaia of Factorial Experimenta* 

Imperial Bureau of Soil Science, Harpenden, Herts. 

5. Rietz, H. L. Some topics in sampling theory. BuU. Amer. Math. 

Soc* vol. 43, pp. 209-30. 

6. Crammer, H. Random Vai^ablea and Probability Diatribuliona. 

Cambridge: University Press. 

7. Cornish, E. A. and Fisher, R. A. Moments and cumulants m the 

specification of distributions. Revue de VInatitut International 
de Statiatique, vol. 4, pp. 1-14. 

8. Pitman, E. J. G. The * closest’ estimates of statistical parameters. 

Proc* Camb* Phil. Soc. vol. 33, pp. 212-22. 

1938, 1. Mills, F. C. Statiatical Methoda. Revised edition. London: 

Isaac Pitman. 

2. Fisher, R. A. Statiatical Methoda for Reaearch Workera. Seventh 

edition. Edinburgh: Oliver and Boyd. 

3. Onbdeoob, G. W. Su^tiatical Methoda. Revised edition. Iowa: 
, Collegiate Press. 



Literature for Reference 267 

1938, 4. Yates, P. Orthogonal functions and tests of significance in the 

analysis of variance. Supp, to Joum, Roy, Slot, Soc, vol. 6, 
pp. 177-80. 

6. Kbndali., M. G. The conditions under which Sheppard’s correc¬ 
tions are valid. Joum. Roy, Stat, Soc, vol. 101, pp. 692-606. 

6. Rietz, H. L. On a recent advance in statistical inference. Amer. 

Math. Monthly, vol. 46, pp. 149-68. 

7. Fisher, R. A. and Yates, F. Statistical Tables, Edinburgh: 

Oliver and Boyd. 

1939, 1. Aitken,A.C. Statistical Mathematics. Edinburgh: Oliver and Boyd. 

2. Goulden, C. H. Methods of Statistical Analysis. New York: 

Wiley and Sons. 

3. Bjbnney, J. F. Mathematics of Statistics, New York: Van 

Nostrand. 

4. Kurtz, A. K. and Edgerton, H. A. Statistical Dictionary of 

Terms and Symbols. New York: Wiley and Sons. 

6. Jeffreys, H. Theory of Probability. Oxford: Clarendon Press. 

6. Rider, P. R. An Introduction to Modem Statistical Methods. 

New York: Wiley and Sons. 

7. Smith, J. H. Tests of Significance. What they mean and how to use 

them. Chicago: University Press. 

1940, 1. Plummer, H. C. Probability and Frequency. London: MacMillan. 

2. Peters, C. C. and Van Voorhis, W. R. Statistical Procedures and 

their Mathematical Bases. New York: McGraw-Hill. 

3. Sa WHINS, D. T. Elementary presentation of the frequency dis¬ 

tributions of certain statistical populations associated with the 
normal population. Joum. and Proc. Roy, Soc, N.S,W. vol. 74, 
pp. 209-39. 

4. Kendall, M. G. Note on the distribution of quantiles for large 

samples. Supp. to Joum. Roy. Stat. Soc. vol. 7, pp. 83-5. 

6. Wishart, j. Field trials. Their lay-out and statistical analysis. 
Imperial Bureau of Plant Breeding and Genetics, School of 
Agriculture, Cambridge. 

6. Pitman, E. J. G. Tests of hypotheses concerning location and scale 

parameters. Bio7netrika, vol. 31, pp. 200-15. 

7. Hsu, P. L. On generalized Analysis of Variance. Ibid. pp. 221-37. 

8. Kendall, M. G. Some properties of Aj-statistics. Annals Eugenics, 

vol. 10, pp. 106-11, 215-22 and 392-402. 

1941, 1. Sa WHINS, D. T. Remarks on goodness of fit of hypotheses and on 

Pearson’s ;!^*-test. Joum. and Proc, Roy. Soc. N.S.W. vol. 76, 
pp. 85-96. 

2. Kendall, M. G. A theory of Randomness. Biometrika, vol. 32, 

pp. 1-15. 

3. Bartlett, M. S. The statistical significance of canonical corre¬ 

lations. Ibid. pp. 29-37. 

4. Hsu, P. L. Analysis of Variance*from the power functiton stand¬ 

point. Ibid, pp. 62-9. > 

' i8-3 



268 Literature for Beference 

1941, 5. Nbyman, J. Fiducial argument and the theory of confidence 

intervals. Ibid, pp. 128-50. 

6. Hotellino, H. Experimental determination of the maximum of a 

function. Ann. Math. Stat. vol. 12, pp. 20-45. 

7. Wiles, S. S. Determination of sample sizes for setting tolerance 

limits. Ibid. pp. 91-96. 

8. Wald, A. and Bbooeneb, R. J. On the distribution of Wilks' 

statistic. Ibid. pp. 137-52. 

1942, 1. Haldane, J. B. S. The mode and median of a nearly normal dis¬ 

tribution with given cumulants. Biometrikat vol. 32, pp. 294-9. 

2. Habtley, H. O. The range in random samples. Ibid. pp. 334-48. 

3. Geaby, R. C. Inherent relations between random variables. Proc. 

Royal Irish Acad. vol. 47, pp. 63-76. 

4. Neyman, j. Basic ideas of the theory of testing statistical hypo¬ 

theses. Joum. Roy. Stat. Soc. vol. 105, pp. 292-327. 

5. Aitebn, a. C. and Silvebstonb, H. On the estimation of statis¬ 

tical parameters. Proc. Roy. Soc. Edinb. A, vol. 61, pp. 186-94. 

6. Andebson, R. L. Distribution of the serial correlation coefficient. 

Ann. Math. Stat. vol. 13, pp. 1-13. 

7. Wald, A. Setting of tolerance limits when the sample is large. 

Ibid. pp. 389-99. 

8. Wiles, S. S. Statistical prediction with special reference to the 

problem of tolerance limits. Ibid. pp. 400-9. 

1943, 1. Wiles, S. S. Mathematical Statistics. Princeton University Press. 

2. Kendall, M. G. The Advanced Theory of Statistics. Vol. 1. London; 

Griffin and Co. 

3. Vajda, S. The algebraic analysis of contingency tables. Joum. 

Roy. Stat. Soc. vol. 106, pp. 333-42. 

4. Wald, A. An extension of Wilks* method for setting tolerance 

limits. Ann. Math. Stat. vol. 14, pp. 45-55. 

5. CuBTiss, J. H. On transformations used in the Analysis of 

Variance. Ibid. pp. 107-22. 

6. Mises,R.v. On the problem of testing hypotheses. J&id. pp. 238-52. 

1944, 1. Saweins, D. T. Simple regression and correlation. Joum, and 

Proc, Roy, Soc, N.S,W, vol. 77, pp. 85-95. 

2. Kendall, M. G. On autoregressive time series. BiometriJca, vol. 33, 

pp. 105-22. 

3. Tippett, L. H. C. The control of industrial processes subject to 

trends in quality. Ibid. pp. 163-72. 

4. Habtley, H. 0. Studentization. Ibid. pp. 173-80. 

6. Wald, A. On a statistical problem arising in the classification of 
an individual into one of two groups. Ann, Math, Stat, vol. 15, 
pp. 145-62. 

6. Wald, A. and Wolfowitz, J. Statistical tests based on permuta¬ 

tions of observations. Ibid. pp. 358-72. 

7. Gumbbl, E. j. Range# and midranges. Ibid. pp. 414-22. 



LUerature for Reference 269 

1945, 1. Kbndall, M. G. The treatment of ties in ranking problems. 

Biometrika, vol. 33, pp. 239-61. 

2. Peabson, E. S., Godwin, H. J. and Habtley, H. 0. The prob¬ 

ability integral of the mean deviation. Ibid. pp. 262-65. 

3. Aitken, a. C. On linear approximation by least squares. Proc, 

Roy. Soc. Edinb, A, vol. 62, pp. 133-46. 

4. Hsu, P. L. The approximate distributions of the mean and 

variance of a sample of independent variables. Ann. Math. 
Stat. vol. 16, pp. 1-29. 

6. Wald, A. Sequential tests of statistical hypotheses. I6id. pp. 117- 

86 . 

6. Felleb, W. On the normal approximation to the binomial dis¬ 

tribution. Ibid. pp. 319-29. 

7. Bbonowski, J. and Neyman, J. The variance of the measure of 

a two-dimensional random set. Ibid. pp. 339-41. 

1946, 1. Cbam^ib, H. Mathematical Methods of Statistics. Princeton 

University Press. 

2. Kendall, M. G. The Advanced Theory of Statistics. Vol. 2. 

London: GrifHn and Co. 

3. Yates, F. A review of recent statistical developments in sampling 
’ and sampling surveys. Joum. Roy. Stat. Soc. vol. 109, pp. 12-30. 

4. Babtlett, M. S. On the theoretical specification of sampling pro¬ 

perties of autocorrelated time series. Supp. to Joum. Roy. Stat, 
Soc. vol. 8, pp. 27-41. 

6. Bbown, G. W. and Tukey, J. W. Some distributions of sample 
means. Ann. Math. Stat. vol. 17, pp. 1-12. 

6. Gibshick, M. a. Contributions to the theory of sequential 

analysis. Ibid. pp. 123-43 and 282-98. 

7. Wald, A. and Wolfowitz, J. Tolerance limits for a normal dis¬ 

tribution. Ibid. pp. 208-15. 

8. Wilks, S. S. Sample criteria for testing equality of means, of 

variances and of covariances in a normal multivariate distri¬ 
bution. Ibid. pp. 257-81. 

1947, 1. Wald, A. Sequential Analysis. New York: Wiley and Sons. 

2. Hobl, P. G. Introduction to Mathematical Statistics. New York: 

Wiley and Sons. 

3. Bartlett, M. S. Multivariate Analysis. Supp. to Joum. Roy. Stat. 

Soc. vol. 9, pp. 176-90. 

4. Pearson, E. S. The choice of statistical tests. BiometriJca, vol. 34, 

pp. 139-67. 

6. Geary, R. C. Testing for normality. Ibid. pp. 209-42. 

6. Hotellino, H. a generalized T measure of multivariate dis¬ 
persion. Ann. Math. Stat. vol. 18, p. 298. 

1948, 1. Pitman, E. J. G. Lecture notes on Non-parametric Statistical 

Inference. University of North Carolina, Institute of Statistics. 
(Mimeographed.) * 



f « 

270 Literature for Reference 

1948, 2. Wold, H. 0. A. Random Normal Deviates, Tracts for Computers, 

No. 24. Cambridge University Press. 

3. Kendall, M.G. Rank Correlation Methods, London: Griffin and Co. 

4. Habtlby, H. O. The estimation of non-linear parameters by 

‘internal least squares*. Biometrikay vol. 36, pp. 32-46. 

6. Radhakbishna Rao, C. Tests of significance in multivariate 
analysis. Ibid. pp. 68-79. 

6. Stevens, W. L. Control of gauging. Joum. Roy, Stat, Soc, B, 

vol. 10, pp. 64-98. 

7. CooHBAN, W. G. and Bliss, C. I. Discriminant functions with 

covariance. Ann, Math. Stat. vol. 19, pp. 151-76. 

8. Wald, A. and Wolfowitz, J. Optimum character of the sequen¬ 

tial probability ratio test. Ibid, pp. 326-39. 

1949, 1. David, F. N. Probability Theory for Statistical Methods, Cam¬ 

bridge University Press. 

2. Neyman, J. (Ed.). Proceedings of the Berkeley Symposium on 

Mathematical Statistics and Probability, University of California 
Press. 

3. Barnard, G. A. Statistical Inference. Joum. Roy, Stat, Soc, B, 

vol. 11, pp. 115-39. , 

4. Moyal, j. E., Bartlett, M. S. and Kendall, D. G. S 5 nmposium 

on stochastic processes. Ibid, pp. 160-264. 

6. WiSHABT, J. Cumulants of multivariate multinomial distri¬ 
butions. Biometrikay vol. 36, pp. 47-68. 

6. Lancaster, H. O. The derivation and partition of x® in certain 

discrete distributions. Ibid. pp. 117-29 and 370-82. 

7. Walsh, J. E. On significance tests for the median. Ann, Math. 

Stat, vol. 20, pp. 64-81. 

8. Wald, A. Statistical decision functions. Ibid, pp. 166-205. 

1950, 1. Feller, W. An Introduction to Probability Theory and its Applica¬ 

tions. New York: Wiley and Sons. 

2. Fisher, R. A. Contributions to Mathematical Statistics. New York: 

Wiley and Sons. 

3. Wald, A. Statistical Decision Functions. New York: Wiley and 

Sons. 

4. Deming, W. E. Some Theory of Sampling, New York: Wiley and 

Sons. 

6. Mood, A. McF. Introduction to the Theory of Statistics, New York: 
McGraw-Hill. 

6. Pearson, E. S. On questions raised by the combination of tests 

based on discontinuous distributions. Biometrika, vol. 37, 
pp. 383—98. 

7. Moran, P. A. P. Recent developments in ranking theory. Joum, 

Roy. Stat. Soc. B, vol. 12, pp. 163-62. See also pp. 292-6. 

8. IiEHiiiANN, E. L. Some principles of the theory of testing hypo- 

theses. Ann. Math'.*Stat. vol. 21, pp. 1-26. 



Literature for Reference *271 

1961, 1. Dixon, W. J. and Massey, F. J. Introduction to Statistical Analysis. 

New York: McGraw-Hill. 

2. Rider, P. R. The distribution of the range in samples from a 

discrete rectangular population. Joum. Amer. Stat. Ass, vol. 46, 
pp. 375-8 and 602-7. 

3. Barnard, G. A. The theory of information. Joum, Roy, Stat, 

Soc, B, vol. 13, pp. 46-64. 

4. Ramakrishnan, A. Some simple stochastic processes. Ibid, 

pp. 131-40. 

6. Kendall, M. G. Regression, structure and functional relation¬ 
ship. I. Biometrika^ vol. 38, pp. 11-25. 

6. Keeping, E. S. A significance test for exponential regression. 
Ann, Math, Stat. vol. 22, pp. 180-98. 

1962, 1. Duncan, A. J. Quality Control and Industrial Statistics. Chicago: 

Richard D. Irwin. 

2. WHiLiAMS, E. J. Some exact tests in multivariate analysis. 

Biormtrikay vol. 39, pp. 17-31. See also pp. 65-81. 

3. Kendall, M. G. Regression, structure and functional relation¬ 

ship. II. Ibid. pp. 96-108. 

Tocher, K. D. On the concurrence of a set of regression lines. 
Ibid, pp. 109-17. 

5. Barnard, G. A. The frequency justification of certain sequential 

tests. Ibid. pp. 144-50. 

6. Cox, D. R. Estimation by double sampling. Ibid, pp. 217-27. 

7. Chernoff, H. and Scheffi^, H. A generalization of the Neyman- 

Pearson fundamental lemma. Ann. Malh. Stat. vol.23,pp. 213-26. 

8. Cochran, W. G. The x*-test of goodness of fit. Ibid, pp. 315-45. 

1953, 1. Cochran, W. G. Sampling Techniques. New York: Wiley and 
Sons. 

2. Doob, j. L. Stochastic Processes. New York: Wiley and Sons. 

3. Rider, P. R. The distribution of the product of ranges in samples 

from a rectangular population. Joum. Amer. Stat. Ass. vol. 48, 
pp. 646-9. See also pp. 826-30. 

4. Bartlett, M. S. Approximate confidence intervals. Biometrika, 

vol. 40, pp. 12-19 and 306-17. 

5. Anscombe, F. j. Sequential Estimation. Joum. Roy. Stat. Soc, B, 

vol. 16, pp. 1-21. 

6. Bindley, D. V. Statistical Inference. Ibid. pp. 30-65. 

7. Hotelling, H. New light on the correlation coefficient and its 

transforms. Ibid. pp. 193-225. 

8. McMillan, B. The basic theorems of information theory. Ann. 

Math. Stat. vol. 24, pp. 196-219. 

1964, 1. Savage, L. J. The Foundations of Statistics, New York; Wiley 
and Sons. 

2. Fisher, R, A. The Analysis of Variance with various binomial 
transformations. Biometrics, v<fi. 10, pp. 130 ~9. ' 



272 * Literature for Reference 

1054, 3i Dunnbtt, C. W. and Sobel, M. A bivariate generalization of 
Student’s ^-distribution. Biometrikaf vol. 41, pp. 153-69. 

4. Babnabd, G. a. Simplified decision functions. Ibid, pp. 241-51. 

5. Whittle, P. On stationary processes in the plane. Ibid, pp. 434- 

49. 

6. James, A. T. Normal multivariate analysis and the orthogonal 

group. Ann, Math, Stat, vol. 25, pp. 40-75. 

7. Haktley, H. O. and David, H. A. Universal bounds for mean 

range and extreme observation. Ibid, pp. 85-99. 

8. Buthebfobd, R. S. G. On a contagious distribution. Ibid, 

pp. 703-13. 

1955, 1. Babtlett, M. S. An Introduction to Stochastic Processes, Cam¬ 

bridge University Press. 

2. Hannan, E. J. Exact tests for serial correlation. Biometrika, 

vol. 42, pp. 133-42. See also pp. 316-26. 

3. Watson, G. S. Serial correlation in regression analysis. I. Ibid. 

pp. 327-41. 

4. Yates, F. The use of transformations and maximum likelihood in 

the analysis of quantal experiments involving two treatments. 
Ibid, pp. 382-403. 

5. Jowett, G. H. The comparison of means of sets of observations 

from sections of independent stochastic series. Joum, Roy. 
Stat. Soc, B, vol. 17, pp. 208-27. 

6. Le Cam, L. An extension of Wald’s theory of statistical decision 

functions. Ann. Math. Stat. vol. 26, pp. 69-81. 

7. Kao, M., Kibfeb, J. and Wolfowitz, J. On tests of normality 

and other tests of goodness of fit. Ibid. pp. 189-211, 

1956, 1. Babtholomew, D. J. A sequential test of randomness for events 

occurring in time or space. Biometrikaf vol. 43, pp. 64-78. 

2. Daniels, H. E. The approximate distribution of serial corre¬ 

lation coefficients. Ibid, pp. 169-85. 

3. Quenouille, M. H. Notes on bias in estimation. Ibid. pp. 353-60. 

4. Watson, G. S. and Hannan, E. J. Serial correlation in regression 

analysis. II. Ibid, pp. 436-48. 

5. Babton, D. E. and David, F. N. Some notes on ordered random 

intervals. Joum, Roy, Stat. Soc. B, vol. 18, pp. 79-94. 

6. Mallows, C. L. Generalizations of Tchebycheff’s inequalities. 

lUd. pp. 140-68. 

7. Kemp, C. D. and Kemp, A. W. Generalized hyporgeometrio dis¬ 

tributions. Ibid. pp. 202-11. 

8. Chebnofp, H. Large sample theory. Ann. MaBi, Stat, vol. 27, 

pp. 1-22, 



INDEX 


The numbers refer to the pages 


Absolute deviation, 7 
Addition of probabilities, 21-2, 36 
Additive property, of chi-square, 177; 
of cumulants, 41; of Gamma variates, 
161; of normal variates, 67-8, 61-3; 
of Poissonian variates, 49, 69 
Aitken, A. C., 42, 46, 68, 106, 126, 129, 
181, 206, 266, 267-9 
Analysis of variance, 209-26, 267-8, 
271; of co-varianco, 226-36 
Anderson, R. L., 268 
Anscombe, F. J., 271 
Area under normal probability curve, 
66 

Arithmetic mean, 1; see Mean 
Arrays 76, 87, 93-4, 180-1, 221-2 
Asterick for significance, 172, 217, 221 
At random, 21; see Random 
Attributes, sampling of, 109-16, 126 
Autooorrelated, 269 
Autoregressive, 268 

Barnard, G. A., 270-2 
Bartholomew, D. J., 272 
Bartlett, M. S., 183, 269-60, 266, 267, 
269-72 

Barton, D. E., 272 
Berkeley symposium, 270 
Bernoulli, J., 32-4 

Beta distributions and variates, 153-9 
Beta function, 147-9 
Binomial distribution, 28-9, 38, 46-7, 
110, 173 

Bivariate distributions. Chapters iv 
and v; b. populations, 138-43,177-80, 
192-4, 200-6, 226-36 
Bliss, C. I., 270 

Bowley, A. L., 16, 68, 83, 141, 143, 264 

Bronowski, J., 269 

Brookner, R. J., 268 

Brown, G. W., 269 

Burgess, R. W., 266 

Camp, B. H., 16, 42, 68, 83, 106, 126, 
143, 252, 260, 266 
Cauchy’s distriWion, 169 


Change of origin, 4, 213, 229; of unit, 
4, 6, 9, 76-6 

Characteristic function, 37, 39 
Chemoff, H., 271-2 

Chi-square, Chapter rx; tests, 170-6; 
table, 171 

Chords of circle, 43-4 
Class, 10, 76, 130; class frequency, 10, 
130, 173-4; class interval, 10, 41, 130 
Cochran, W. G., 160, 216, 266, 270-1 
Coefficient; see Correlation etc. 
Coefficient of regression, 71; distribu¬ 
tion of, 179-80; significance of, 194 
Coefficient of variation, 7, 144 
Collateral reading; List at the end of 
each chapter 

Combination of estimates of r, 203-6 
Comparison of samples, 112-14, 122-3, 
190-2, 202-4 

Component distributions, 2, 8, 16 
Compound probability, 21-4, 36-6 
Conditional probability, 23 
Confidence intervals, 268, 271 
Confidence range and limits, 190 
Constraint, linear, 166-8, 175 
Contagious distribution, 272 
Continuous distributions, 12-16, 30-2, 
68-9,74,93-7,120; c. variable, 3,30-2 
Convergence in probability, 33 
Coolidge, J. L., 42, 68, 106, 129, 266 
Cornish, E. A., 41, 266 
Correlation coefficient, 71-83, 96-6, 
141-3; distribution of, 177-9; signi¬ 
ficance of, 193-4, 223-4; transforma¬ 
tion of, 200-3; partial correlation, 
242, 250-2, 256-7; multiple correla¬ 
tion, 242, 249-50, 257-9 
Correlation of means, 138-9 
Correlation of ranks, 79-80 
Correlation ratios, 89-96, 221-2; dis¬ 
tribution of, 180-1 
Correlation table, 76-7, 86 
Covariance, 27, 68-9, 81; of class fire- 
quencies, 132-3; of moments, 136, 
lAO-1; of means, 138; analysis of 
oov., 226-36 * 




f 


Index 


274 

Covariation, 226 
' Cox, D. R., 271 
Cr€un6r, H., 37, 266, 269 
Criterion, 209, 214, 226, 233 
Cmnulants, 39-42, 64, 86, 120-1, 161 
Cumulative fimction, 39-41, 48, 64, 86, 
120-1, 161 
Curtiss, J. H., 268 
Curve fitting, 70, 99-106 
Curved regression lines, 99-106 
Curvilinear regression, 99-106 

Daniels, H. E., 272 
David, F. N., 270-2 
David, H. A., 272 
Deciles, 3, 67, 126 
Decision functions, 270-2 
Degrees of freedom, 167-8 et seq. 
Deltheil, R., 39 
Doming, W. E., 270 
Deviation, 4, 26, 31, 70, 72, 82, 243, 
247 

Discrete distribution, 12, 46-9, 67, 
119-20, 130-1; d. variable, 3 
Dispersion, 6 
Dixon, W. J., 271 
Doob, J. L., 271 
Dressel, 39 
Duncan, A. J., 271 
Dunnett, C. W,, 272 

Eden, T., 240 
Edgerton, H. A., 267 
Ellipse of maximum probability, 107 
Empirical definition of probability, 
34-6 

Equally likely, 19-20 
Error, 216 et seq.; see Standard error 
Estimate of parameter, 118; of variance, 
131, 210, 216 et seq.; of covariance, 
140, 228, 234 

Estimate . from regression equation, 
73-6, 242, 246, 249 
Excess, 16, 40, 161 
Exhaustive, 19 

Expectation; see Expected value 
Expected frequencies, 47, 173-4 
Expected value, 24-32 
Exponential distribution, 18, 37, 40, 
161; e. regression, 271 
Ezekiel, M., 83, 106, 126, 260, 266 


Favourable, 19 
Feller, W., 269-70 

Fiducial limits, 121-2, 189-90, 202; f. 

argument, 268 
Filon, L. N. G., 263 
Fisher, R. A., v, xv, 16, 39, 41, 68, 104, 
136, 162, 169-71, 174, 177, 181-2, 
184, 187-90, 193-4, 198, 200-2, 
204-6, 209, 236, 240, 266-8, 260, 
264-7, 270-1 

Fitting curves to data, 70, 99-106, 176 
Freedom, degrees of, 167-8 et seq. 
Frequency, 1; f. distributions. Chap¬ 
ter i; f. curve, 13; class f., 10, 130; 
expected f., 47, 173-4 
F statistic, 196-200; F table, 199; see 
Variance ratio 

Functional dependence, 26; f. relation¬ 
ship, 90, 166-8, 271 

Gamma distributions and variates, 
149-61 

Gamma function, 140-8 c 

Gauging, 270 
Geary, R. C., 268-9 
Geometrical proof of chi-square dis¬ 
tribution, 182 
Girshick, M. A., 269 
Godwin, H. J., 269 
Goodness of fit, 173-6 
Gosset, W. S.; see ‘Student* 

Goulden, C. H., 16, 68, 83,105,181, 236, 
260, 267 

Grouped distribution, 10-11, 41-2, 76-8 
Grouping error, 41 
Gumbel, £. J., 268 

Haldane, J. B. S., 268 
Hall, A. D., 16 
Hannan, £. J., 272 
Harmonic mean, 46, 163 
Harris, J. A., 105, 264 
Hartley, H. O., 268-70, 272 
Helmert, F. R., 164, 263 
Hoel, P. G., 269 

Homogeneity of estimates, 183, 203-5 
Homogeneous population, 210-12 et seq. 
Homoscedastic, 89, 96 
Hotelling, H., 268-9, 271 
Hsu, P. L., 267, 269 
Hypergeometric, 272 
Hyperplane, 167-8 
Hypersphere, 182 


Faotoriid moments, 18, 63 



Index 


9 


275 


Incomplete Beta function, 162 
Independent events, 23 
Independent variates, 26-7, 31-2, 38, 
41, 44, 81-2 et seq. 

Index of correlation, 102-3, 224, 239 
Industrial processes, 268; i. statistics, 

271 

Information theory, 271 
Interquartile range, 3, 127-8 
Interval dx, 13, 30, 45 et seq. 

Intraclass correlation, 97-9 
Inverse correlation, 80 
Irwin, J. O., 212, 236, 265-6 

Jacobian 63, 165 
James, A. T., 272 
Jeffreys, H., 267 

Jones, D. C., 16, 58, 83, 105, 125, 143, 
264 

Jowett, G. H., 272 

Kac, M., 272 
Keeping, E. S., 271 
Kelley, T. L., 264 
Kemp, A. W., 272 
Kemp, C. D., 272 

Kendall, M. G,, 16, 37, 39, 41-2, 58, 83, 
105, 109, 124-5, 127, 143, 145, 149, 
168, 181, 205, 256, 260, 266-71 
Kennery, J. F., 16, 42, 58, 83, 105, 125, 
181, 205, 260, 267 
Kiefer, J., 272 
Kurtosis, 15, 40, 161 
Kurtz, A. K., 267 
A;-statistics, 267 

Lancaster, H. O., 270 

Large ssunples, 111-14, 122-4, 132-43, 

272 

Latin square, 217-21, 238 
Leake’s drill, 206 
Least squares, 70, 100, 269-70 
Le Cam, L., 272 
Lee, Alice, 248 
Lehmann, E. L., 270 
Levy, H., 42, 58, 266 
L4vy, P., 37, 39, 265 
Lexis, W., 116; Lexian series of trials, 
11&-16 

Likelihood, 170 
Lindley, D. W., 271 
Linear constraint, 168-8, 176 
Linear function of variates, 83 


Linear functional relationship, 73 • 
Linear regression, 88-9, 93-6, 224-6,> 
244, 255 

Lines of regression, 69-76, 88 

McMillan, B., 271 
Mallows, C. L., 272 
Massey, F. J., 271 
Mathematical Notes, 63-6 
Maximum likelihood, 170, 272 
Mean, 1, 15, 24, 46, 48, 119-21, 149, 
163, 157, 189-92, 198; sampling dis¬ 
tribution of, 119-21, 135, 162-3 
Mean deviation, 7, 17, 54 
Mean square deviation, 6 
Median, 3, 5, 15, 125, 268, 270 
Mercer, W. B., 16 
Midrange, 268 

Mills, F. C., 16, 58, 83, 106, 125, 181, 
206, 236, 260, 266 
Miner, J. R., 264 
Mises, R. V., 268 

Mode, or modal value, 3, 6, 15, 149,163, 
167, 164, 197, 268 

Moment generating fimction (m.g.f.), 
36-9, 47-8, 54, 67, 81-2, 106-7, 161 
Moments, 8, 13, 24, 30, 67-9 
Mood, A. M.cF., 270 
Moran, P. A. P., 270 
Moyal, J. E., 270 

Multiple correlation, 242,249 50,257-9; 

sampling distribution of, 257-9, 262 
Multiplication of probabilities, 21-4, 
36-6 

Multivariate distributions. Chapter xn; 

m. cmalysis, 269-72 
Mutually exclusive, 19 

Neyman, J., 183, 265, 268-71 
Normal correlation surface, 96-7 
Normal distribution, 16, 60-8, 64-6, 
96-7, 106-7, 255-6 

Normal equations, 71-2, 100, 102, 243, 
246 

Normal population, 118, 121-2, 125, 
137-8, 143-4, 169-70, 176, 177-81, 
186-206, 211, 256-9 
Normally correlated, 96 

Observed correlation, significance of, 
192-3, 201-6, 266-9; observed re¬ 
cession, 194 

Oads against, or in favour of, 21 



Index 


276 ' 

Opposite event, 20 
K Optimum value, 170 
Order of a coefficient, 244 et seq. 
Ordinates of the normal probability 
curve, 53 

Orthogonal group, 272 
Orthogonal, linear transformation, 
164-6, 169, 178, 208 

Parabolic curve of regression, 101, 108, 
239 

Parameter, 117-18, 149-61 
Pcurtial correlation, 242, 250-2, 256-7 
Partition values, 3, 124-6 
Pearson, E. S., 183, 265, 269-71 
Pearson, Karl, v, 10, 16, 149, 163, 157, 
164, 173, 181, 248, 260, 263-4, 266 
Percentiles, 3 
Peters, C. C., 267 

Pitman, E. J. G., 160-1,205,266-7,269 
Plane of regression, 246-7 
Plummer, H. C., 42,58,83,105,183,267 
Poinoar5, H., 42, 264 
Poisson, S. D.; distribution, 44, 47-9, 
59, 63-4; series of trials, 29, 114-16 
Poissonian sampling, 114-15, 128-9 
Polynomial regression, 99-105 
Population, 109, 130-1; see Normal 
population 
Precision, 110 
Primary subscripts, 244 
Principle of least squares, 70, 100 
Probability, Chapter n; convergence 
in pr., 33; pr. curve, 30, 62; pr. 
density, 30, 39; pr. distribution, 
24-34, 80-2; pr. function, 30-1 

Quality control, 271 
Quantiles, 3, 267 
Quartiles, 3, 66—7, 126, 127 
Quenouille, M. H., 272 
Quotient of Gamma variates, 158-60; 
of normal variates, 159-60 

r and s table, 201 
Radhakrishna Rao, C., 270 
Ramakrishnan, A., 271 
Random selection, 21, 33; r. scunpling, 
109, 117 

Range of sample, 196-6, 268, 271-2 
Rank ooccelation, 79-80, 270 
Ranking problems, 269; r. theory, 270 
Ratio; sssV>>rrelation ratio 


Rectangular distribution, 14,30,38,44; 

r. population, 271 
Reduction formulae, 64, 259-4 
Reed, L. J., 85 

Regression coefficient, 71; distribution 
of, 179-80; significance of, 194 
Regression, lines of, 69-76, 88; curvi¬ 
linear r., 99-105; r. function, 223-4 
Related regressions, 104-5 
Relative frequency, 8, 13, 33-6, 47, 69, 
130; r.f. density, 13, 68 
Repeated trials, 28-9, 110, 114-16, 
128-9 

Residual, 242, 245 et seq. 

Residual sum of squares, 230, 234-5 
Rider, P. R., 16, 68, 83, 106, 125, 181, 
206, 236, 260, 266, 267, 271 
Rietz., H. L., 16, 42, 68, 83-4, 95, 106, 
107, 126, 143, 206, 266, 260, 264-7 
Root-mean-square deviation, 6 
Roth, L., 42, 58, 266 
Rutherford, R. S. G., 272 

Sampling distribution, 110, 118; of 
mean, 119-21, 162-3; of variance, 
162-3, 169-70; of correlation coeffi¬ 
cient, 177-9, 200-1, 256-9; of re¬ 
gression coefficient, 179-80; of corre¬ 
lation ratio, 180-1; of t, 186-7; of F, 
196-8; of z, 198, 200-3 
Sampling t)ieory. Chapter vi et seq. 
Also 269-271 

Sanders, H. G., 221, 226, 228, 230, 236, 
239, 266 

Savage, L. J., 271 

Sawkins, D. T., xrv, xv, 164, 160, 169, 
177, 181, 206, 267-8 
Scatter diagram, 67 
Scheff6, H., 271 
Secondary subscripts, 244 
Seminvariants, 39 
Sequential, 269-72 
Serial correlation, 268, 272 
Sheppard, W. F., 263; Sheppard’s ad¬ 
justments, 11, 17, 41-2 
Significance, test of. 111, 170-3, 186- 
206; sig. of observed correlation, 
192-3, 201-6, 266-9; of observed 
regression, 194 
Significant, 111 
Silverstone, H., 268 
Simple scunpling, 110, 117 
Skewness, 10, 16, 161 



Index 


Small samples, Chapter x 
Smith, J. H., 206, 267 
Snedecor, G. W., xv, 83, 126, 181, 
198-9, 236, 260, 266 
Sobel, M., 272 

Standard deviation (s.d.)> 6, 244 
Standard error (s.e.), 73, 110-16, 118, 
122-6, 130-46; of class frequency, 
131-2; of correlation coeff., 141-3, 
146, 160; of coeff. of variation, 144; 
of covariance, 141; of moments, 
133-6, 140, 144-6; of variance, 136; 
of S.D., 137-8, 144 
statistic, 131, 170, 186-6 
Statistical definition of probability, 
34-6 

Statistical independence, 26-7 
Stevens, W. L., 270 
Stirling’s formula for n!, 64 
Stochastic, 270-2 
Structure, 271 

‘Student’ (W. S. Gosset), 186, 206, 264 
Studentization, 268 
Success) 28, 110 

Sum of normal variates, 67-8, 61-3; of 
Gamma variates, 151; of Poissonian 
variates, 49, 69 
‘Sum of products’, 226-9 
‘Sum of squares’, 169-70, 209-10, 
212-13; residual sum, 230, 234 
Symmetiiccd distribution, 9, 10, 13, 30; 
61 

t statistic, 186-96; table, 188 
Tables: for normal curve, 63, 66; of 
chi-square, 171; of t, 188; of r, 193; 
ofi^, 199; of rand z, 201 
Tchobychof, P. L., 32-4, 272 
Tests of significance. 111, Chapter x 
Thiele, T. N., 39, 263, 265 
Tippett, L. H. C., 16, 83, 105, 125, 205, 
236, 260, 265, 268 
Tocher, K. D., 271 
Tolerance limits, 268-9 
Total correlation, 246 
Total frequency, 1 


677 

Total probability, 21-2, 35 
Transformation of correlation* coetf- 
cient, 257 

Treatments, 218-21, 238, 240 
Trial, 19, 28-9, 110 
Tukey, J. W., 269 

Unbiassed estimate of population 
variance, 131, 191, Chapter xi 
Uncorrelated, 73, 81 
Uniform distribution, 14, 30, 32, 38, 44 
Unimodal, 3 

Universe, 109; see Population 
Uspensky, J. V., 20, 42-3, 266 

Vajda, S., 268 
Van Voorhis, W. R., 267 
Variable, 1; sampling of, 116-26 
Variance, 6, 26, 28-9, 39, 46, 48, 64, 
82-3, 136, 150, 163, 167; distribution 
of, 162, 169-70 

Variance ratio (^), 196-200; table of, 
199 

Variate, 24 

Vctfiation, coefficient of, 7, 144 

Wald, A., 268-70, 272 
Walsh, J. E., 270 
Watson, G. S., 272 
Weight^ mean, 1, 84, 90-6, 103 
Whittaker, Lucy, 49 
Whittle, P., 272 

Wilks, S. 8., 258, 260, 265, 267-9 
Williams, E. J., 271 
Wishart, J., 206, 221, 226,228, 230,236, 
239, 266-7, 270 
Wold, H. O. A., 270 
Wolfowitz, J., 268-70, 272 

Yates, F., 203, 236, 266-7, 269, 272 
Yule, G. U., 16, 68, 83, 105, 108-9, 125, 
127, 143, 168, 206, 242, 256, 260, 263, 
266 

z statistic, 198, 200-3 





